An empirical study of policy convergence in Markov decision process value iteration

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
10346669	698875	2005	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Dynamic programming - برنامه‌ریزی پویا یا برنامه‌ نویسی پویا Convergence results - نتایج همگرایی Markov decision processes - پروسه تصمیم گیری مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

An empirical study of policy convergence in Markov decision process value iteration

چکیده انگلیسی

The value iteration algorithm is a well-known technique for generating solutions to discounted Markov decision process (MDP) models. Although simple to implement, the approach is nevertheless limited in situations where many Markov decision processes must be solved, such as in real-time state-based control problems or in simulation/optimization problems, because of the potentially large number of iterations required for the value function to converge to an Îµ-optimal solution. Experimental results suggest, however, that the sequence of solution policies associated with each iteration of the algorithm converges much more rapidly than does the value function. This behavior has significant implications for designing solution approaches for MDPs, yet it has not been explicitly characterized in the literature nor generated significant discussion. This paper seeks to generate such discussion by providing comparative empirical convergence results and exploring several predictors that allow estimation of policy convergence speed based on existing MDP parameters.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Operations Research - Volume 32, Issue 1, January 2005, Pages 127-142

نویسندگان

Christopher W. Zobel, William T. Scherer,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

An empirical study of policy convergence in Markov decision process value iteration

دسترسی سریع

ارتباط

English Website