Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
483225 | European Journal of Operational Research | 2007 | 11 Pages |
Abstract
Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Sumeetpal S. Singh, Vladislav B. Tadić, Arnaud Doucet,