Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
751933 | Systems & Control Letters | 2016 | 6 Pages |
Abstract
We propose a novel actor–critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.
Related Topics
Physical Sciences and Engineering
Engineering
Control and Systems Engineering
Authors
Prashanth L.A., Prasad H.L., Shalabh Bhatnagar, Prakash Chandra,