Approximate Policy Iteration for Markov Control Revisited

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
485830	703340	2012	6 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Approximate Policy Iteration for Markov Control Revisited

چکیده انگلیسی

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API algorithm for discounted reward based on (i) a classical temporal differences update for policy evaluation and (ii) simulation-based mean estimation for policy improvement. Further, we analyze for convergence API algorithms based on Q-factors for (i) discounted reward and (ii) for average reward MDPs. The average reward algorithm is based on relative value iteration; we also present results from some numerical experiments with it.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 12, 2012, Pages 90-95

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Approximate Policy Iteration for Markov Control Revisited

دسترسی سریع

ارتباط

English Website