Error bounds for constant step-size QQ-learning

Article ID	Journal	Published Year	Pages	File Type
756363	Systems & Control Letters	2012	6 Pages	PDF

Abstract

We provide a bound on the first moment of the error in the QQ-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the QQ-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the QQ-learning error, i.e., the difference between the true QQ-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the QQ-learning algorithm can be derived.

Keywords

Stochastic approximation Markov decision processes