Solving average cost Markov decision processes by means of a two-phase time aggregation algorithm

Article ID	Journal	Published Year	Pages	File Type
6897022	European Journal of Operational Research	2015	9 Pages	PDF

Abstract

This paper introduces a two-phase approach to solve average cost Markov decision processes, which is based on state space embedding or time aggregation. In the first phase, time aggregation is applied for policy optimization in a prescribed subset of the state space, and a novel result is applied to expand the evaluation to the whole state space. This evaluation is then used in the second phase in a policy improvement step, and the two phases are then alternated until convergence is attained. Some numerical experiments illustrate the results.

Keywords

Dynamic programming Time aggregation Embedding Markov decision processes Stochastic optimal control