Article ID Journal Published Year Pages File Type
751933 Systems & Control Letters 2016 6 Pages PDF
Abstract

We propose a novel actor–critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.

Related Topics
Physical Sciences and Engineering Engineering Control and Systems Engineering
Authors
, , , ,