Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6897766 | European Journal of Operational Research | 2013 | 9 Pages |
Abstract
We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NP-hard for others. We finally offer pseudopolynomial exact and approximation algorithms.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Shie Mannor, John N. Tsitsiklis,