Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1143393 | Operations Research Letters | 2009 | 5 Pages |
Abstract
This paper deals with the bias optimality of multichain models for finite continuous-time Markov decision processes. Based on new performance difference formulas developed here, we prove the convergence of a so-called bias-optimal policy iteration algorithm, which can be used to obtain bias-optimal policies in a finite number of iterations.
Keywords
Related Topics
Physical Sciences and Engineering
Mathematics
Discrete Mathematics and Combinatorics
Authors
Xianping Guo, XinYuan Song, Junyu Zhang,