کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
696198 890327 2014 4 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Value set iteration for Markov decision processes
ترجمه فارسی عنوان
ارزش تکرار برای فرآیندهای تصمیم مارکوف مجموعه
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
چکیده انگلیسی

This communique presents an algorithm called “value set iteration” (VSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of value iteration (VI) and as a counterpart to Chang’s policy set iteration. A sequence of value functions is generated by VSI based on manipulating a set of value functions at each iteration and it converges to the optimal value function. VSI preserves convergence properties of VI while converging no slower than VI and in particular, if the set used in VSI contains the value functions of independently generated sample-policies from a given distribution and a properly defined policy switching policy, a probabilistic exponential convergence rate of VSI can be established. Because the set used in VSI can contain the value functions of any policies generated by other existing algorithms, VSI is also a general framework of combining multiple solution methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automatica - Volume 50, Issue 7, July 2014, Pages 1940–1943
نویسندگان
,