A basic formula for performance gradient estimation of semi-Markov decision processes

Article ID	Journal	Published Year	Pages	File Type
480129	European Journal of Operational Research	2013	7 Pages	PDF

Abstract

This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.

► We present a basic formula for gradient estimation of SMDPs. ► The formula directly follows from a sensitivity equation in perturbation analysis. ► We develop three gradient estimation algorithms based on the basic formula. ► These algorithms are generalization of Markov cases. ► These algorithms require less storage than the existing algorithm.

Keywords

Perturbation analysis Markov processes Semi-Markov decision processes