Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
480129 | European Journal of Operational Research | 2013 | 7 Pages |
This paper presents a basic formula for performance gradient estimation of semi-Markov decision processes (SMDPs) under average-reward criterion. This formula directly follows from a sensitivity equation in perturbation analysis. With this formula, we develop three sample-path-based gradient estimation algorithms by using a single sample path. These algorithms naturally extend many gradient estimation algorithms for discrete-time Markov systems to continuous time semi-Markov models. In particular, they require less storage than the algorithm in the literature.
► We present a basic formula for gradient estimation of SMDPs. ► The formula directly follows from a sensitivity equation in perturbation analysis. ► We develop three gradient estimation algorithms based on the basic formula. ► These algorithms are generalization of Markov cases. ► These algorithms require less storage than the existing algorithm.