Article ID Journal Published Year Pages File Type
7151536 Systems & Control Letters 2018 7 Pages PDF
Abstract
A reinforcement learning algorithm is proposed in order to solve a multi-criterion Markov decision process, i.e., an MDP with a vector running cost. Specifically, it combines a Q-learning scheme for a weighted linear combination of the prescribed running costs with an incremental version of replicator dynamics that updates the weights. The objective is that the time averaged vector cost meets prescribed asymptotic bounds. Under mild assumptions, it is shown that the scheme achieves the desired objective.
Related Topics
Physical Sciences and Engineering Engineering Control and Systems Engineering
Authors
, ,