Article ID Journal Published Year Pages File Type
480129 European Journal of Operational Research 2013 7 Pages PDF
Abstract

This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.

► We present a basic formula for gradient estimation of SMDPs. ► The formula directly follows from a sensitivity equation in perturbation analysis. ► We develop three gradient estimation algorithms based on the basic formula. ► These algorithms are generalization of Markov cases. ► These algorithms require less storage than the existing algorithm.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,