Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs

Article ID	Journal	Published Year	Pages	File Type
6865470	Neurocomputing	2016	8 Pages	PDF

Abstract

In this paper, an online optimal learning algorithm based on adaptive dynamic programming (ADP) approach is designed to solve the finite-horizon optimal control for multi-player nonzero-sum games with partially unknown dynamics and constrained control inputs. Firstly, it is proved that the online policy iteration (PI) algorithm is equivalent to Newton×³s iteration. Secondly, the single neural networks (NNs) with time-varying activation functions for each player are used to approximate the time-varying solution to the coupled Hamilton-Jacobi-Bellman (HJB) equations in an online and forward-in-time manner. Control constraints are handled through non-quadratic functions. The convergence of NN-based online optimal learning algorithm for the multi-player nonzero-sum games is also proved. Finally, a simulation example illustrates the effectiveness of the proposed algorithm.

Keywords

adaptive dynamic programming Neural network