Numerical stability improvements of state-value function approximations based on RLS learning for online HDP-DLQR control system design

Article ID	Journal	Published Year	Pages	File Type
4942682	Engineering Applications of Artificial Intelligence	2017	19 Pages	PDF

Abstract

In order to overcome numerical stability problems that inherently occur in the recursive least-squares (RLS)-based adaptive dynamic programming paradigms for online optimal control design, a novel method to promote improvements in the state-value function approximations for online algorithms of the discrete linear quadratic regulator (DLQR) control system design is proposed. The algorithms resulting from that methodology are embedded in actor-critic architectures based on heuristic dynamic programming (HDP). The proposed solution is grounded on unitary transformations and QR decomposition, which are integrated in the critic network, to improve the RLS learning efficiency for online realization of the HDP-DLQR control design. In terms of numerical stability and computational cost, the developed learning strategy is designed to provide computational performance improvements, which aim at making possible the real time implementations of optimal control design methodology based upon actor-critic reinforcement learning paradigms. The convergence behavior and numerical stability of the proposed online algorithm are evaluated by computational simulations in two multiple-input and multiple-output models that represent a fourth order RLC circuit with two input voltages and two controllable voltage levels, and a doubly-fed induction generator with six inputs and six outputs for wind energy conversion systems.

Keywords

Heuristic dynamic programming QR decomposition Numerical stability Recursive least squares