Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
7151536 | Systems & Control Letters | 2018 | 7 Pages |
Abstract
A reinforcement learning algorithm is proposed in order to solve a multi-criterion Markov decision process, i.e., an MDP with a vector running cost. Specifically, it combines a Q-learning scheme for a weighted linear combination of the prescribed running costs with an incremental version of replicator dynamics that updates the weights. The objective is that the time averaged vector cost meets prescribed asymptotic bounds. Under mild assumptions, it is shown that the scheme achieves the desired objective.
Keywords
Related Topics
Physical Sciences and Engineering
Engineering
Control and Systems Engineering
Authors
Suhail M. Shah, Vivek S. Borkar,