An analysis of model-based Interval Estimation for Markov Decision Processes

Article ID	Journal	Published Year	Pages	File Type
430494	Journal of Computer and System Sciences	2008	23 Pages	PDF

Abstract

Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.