An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
752492	895434	2010	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Simultaneous perturbation stochastic approximation - تقارن تصادفی همزمان همزمان Function approximation - تقریب تابع

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی

پیش نمایش صفحه اول مقاله

An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

چکیده انگلیسی

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.

Research highlights
► We consider the problem of finding an optimal control policy for a constrained discounted cost Markov decision process when the state and action spaces can be large.
► We present the first actor-critic algorithm with function approximation for this problem.
► Our algorithm is based on the Lagrange multiplier method and combines aspects of temporal difference learning and simultaneous perturbation stochastic approximation.
► We prove the convergence of our algorithm to a constrained locally optimal policy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Systems & Control Letters - Volume 59, Issue 12, December 2010, Pages 760–766

نویسندگان

Shalabh Bhatnagar,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

دسترسی سریع

ارتباط

English Website