کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
752492 895434 2010 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
پیش نمایش صفحه اول مقاله
An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes
چکیده انگلیسی

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.

Research highlights
► We consider the problem of finding an optimal control policy for a constrained discounted cost Markov decision process when the state and action spaces can be large.
► We present the first actor-critic algorithm with function approximation for this problem.
► Our algorithm is based on the Lagrange multiplier method and combines aspects of temporal difference learning and simultaneous perturbation stochastic approximation.
► We prove the convergence of our algorithm to a constrained locally optimal policy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Systems & Control Letters - Volume 59, Issue 12, December 2010, Pages 760–766
نویسندگان
,