Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
430494 | Journal of Computer and System Sciences | 2008 | 23 Pages |
Abstract
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics