Q-learning for Markov decision processes with a satisfiability criterion

Article ID	Journal	Published Year	Pages	File Type
7151536	Systems & Control Letters	2018	7 Pages	PDF

Abstract

A reinforcement learning algorithm is proposed in order to solve a multi-criterion Markov decision process, i.e., an MDP with a vector running cost. Specifically, it combines a Q-learning scheme for a weighted linear combination of the prescribed running costs with an incremental version of replicator dynamics that updates the weights. The objective is that the time averaged vector cost meets prescribed asymptotic bounds. Under mild assumptions, it is shown that the scheme achieves the desired objective.

Keywords

Q-learning Replicator dynamics Satisfiability Markov decision processes Differential inclusions