کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4999930 1460642 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Optimization of Markov decision processes under the variance criterion
ترجمه فارسی عنوان
بهینه سازی فرآیند تصمیم گیری مارکوف تحت معیار واریانس
کلمات کلیدی
روند تصمیم گیری مارکوف، معیار واریانس، بهینه سازی مبتنی بر حساسیت، تکرار سیاست، گرادیان سیاست،
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
چکیده انگلیسی
In this paper, we study a variance minimization problem in an infinite stage discrete time Markov decision process (MDP), regardless of the mean performance. For the Markov chain under the variance criterion, since the value of the cost function at the current stage will be affected by future actions, this problem is not a standard MDP and the traditional MDP theory is not applicable. In this paper, we convert the variance minimization problem into a standard MDP by introducing a concept called pseudo variance. Then we derive a variance difference formula that quantifies the difference of variances of Markov systems under any two policies. With the difference formula, the correlation of the variance cost function at different stages can be decoupled through a nonnegative term. A necessary condition of the optimal policy is obtained. It is also proved that the optimal policy with the minimal variance can be found in the deterministic policy space. Furthermore, we propose an efficient iterative algorithm to reduce the variance of Markov systems. We prove that this algorithm can converge to a local optimum. Finally, a numerical experiment is conducted to demonstrate the efficiency of our algorithm compared with the gradient-based method widely adopted in the literature.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automatica - Volume 73, November 2016, Pages 269-278
نویسندگان
,