Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
474535 | Computers & Mathematics with Applications | 2006 | 6 Pages |
Abstract
We consider utility-constrained Markov decision processes. The expected utility of the total discounted reward is maximized subject to multiple expected utility constraints. By introducing a corresponding Lagrange function, a saddle-point theorem of the utility constrained optimization is derived. The existence of a constrained optimal policy is characterized by optimal action sets specified with a parametric utility.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)