کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
697870 890385 2006 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The control of a two-level Markov decision process by time aggregation
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
پیش نمایش صفحه اول مقاله
The control of a two-level Markov decision process by time aggregation
چکیده انگلیسی
The solution of Markov Decision Processes (MDPs) often relies on special properties of the processes. For two-level MDPs, the difference in the rates of state changes of the upper and lower levels has led to limiting or approximate solutions of such problems. In this paper, we solve a two-level MDP without making any assumption on the rates of state changes of the two levels. We first show that such a two-level MDP is a non-standard one where the optimal actions of different states can be related to each other. Then we give assumptions (conditions) under which such a specially constrained MDP can be solved by policy iteration. We further show that the computational effort can be reduced by decomposing the MDP. A two-level MDP with M upper-level states can be decomposed into one MDP for the upper level and M to M(M-1) MDPs for the lower level, depending on the structure of the two-level MDP. The upper-level MDP is solved by time aggregation, a technique introduced in a recent paper [Cao, X.-R., Ren, Z. Y., Bhatnagar, S., Fu, M., & Marcus, S. (2002). A time aggregation approach to Markov decision processes. Automatica, 38(6), 929-943.], and the lower-level MDPs are solved by embedded Markov chains.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automatica - Volume 42, Issue 3, March 2006, Pages 393-403
نویسندگان
, ,