Semi-Markov decision processes with limiting ratio average rewards

Article ID	Journal	Published Year	Pages	File Type
5774521	Journal of Mathematical Analysis and Applications	2017	8 Pages	PDF

Abstract

We prove that a finite (state and action spaces) semi-Markov decision process with limiting ratio average (undiscounted) payoff has an optimal pure semi-stationary policy (i.e., a semi-Markov policy independent of decision epoch count). We conclude by showing (with the aid of an example) that the result cannot be strengthened further. A crude but finite step algorithm is given to compute such an optimal policy.

Keywords

Semi-Markov decision process