Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
696923 | Automatica | 2012 | 7 Pages |
Abstract
We present an online simulation-based algorithm called Approximate Stochastic Annealing (ASA) for solving infinite-horizon finite state-action space Markov decision processes (MDPs). The algorithm estimates the optimal policy by sampling at each iteration from a probability distribution function over the policy space, which is updated iteratively based on the Q-function estimates obtained via a recursion of Q-learning type. By exploiting a novel connection of ASA to the stochastic approximation method, we show that the sequence of distribution functions generated by the algorithm converges to a degenerated distribution that concentrates only on the optimal policy. Numerical examples are also provided to illustrate the algorithm.
Related Topics
Physical Sciences and Engineering
Engineering
Control and Systems Engineering
Authors
Jiaqiao Hu, Hyeong Soo Chang,