Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6865470 | Neurocomputing | 2016 | 8 Pages |
Abstract
In this paper, an online optimal learning algorithm based on adaptive dynamic programming (ADP) approach is designed to solve the finite-horizon optimal control for multi-player nonzero-sum games with partially unknown dynamics and constrained control inputs. Firstly, it is proved that the online policy iteration (PI) algorithm is equivalent to Newton׳s iteration. Secondly, the single neural networks (NNs) with time-varying activation functions for each player are used to approximate the time-varying solution to the coupled Hamilton-Jacobi-Bellman (HJB) equations in an online and forward-in-time manner. Control constraints are handled through non-quadratic functions. The convergence of NN-based online optimal learning algorithm for the multi-player nonzero-sum games is also proved. Finally, a simulation example illustrates the effectiveness of the proposed algorithm.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Xiaohong Cui, Huaguang Zhang, Yanhong Luo, Peifu Zu,