Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm

Article ID	Journal	Published Year	Pages	File Type
6863525	Neurocomputing	2018	26 Pages	PDF

Abstract

Solving the stochastic linear quadratic (SLQ) optimal control problem generally needs full information about system dynamics. In this paper, a Q-learning iteration algorithm is adopted to solve the control problem for model-free discrete-time systems. Firstly, the condition of the well-posedness for the SLQ problem is given. In order to solve the SLQ problem, the stochastic problem is transformed into the deterministic one. Secondly, in the iteration process of Q-learning algorithm, the H matrix sequence and control gain matrix sequence are obtained without the knowledge of system parameters, and the convergence proof of two sequences is also given. Lastly, two simulation examples are supplied to explain the effectiveness of the Q-learning algorithm.

Keywords

Q-learning Well-posedness