کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
485830 703340 2012 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Approximate Policy Iteration for Markov Control Revisited
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Approximate Policy Iteration for Markov Control Revisited
چکیده انگلیسی

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API algorithm for discounted reward based on (i) a classical temporal differences update for policy evaluation and (ii) simulation-based mean estimation for policy improvement. Further, we analyze for convergence API algorithms based on Q-factors for (i) discounted reward and (ii) for average reward MDPs. The average reward algorithm is based on relative value iteration; we also present results from some numerical experiments with it.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 12, 2012, Pages 90-95