کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
410664 679154 2008 4 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On the bias of batch Bellman residual minimisation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
On the bias of batch Bellman residual minimisation
چکیده انگلیسی

This letter addresses the problem of Bellman residual minimisation in reinforcement learning for the model-free batch case. We prove the simple, but not necessarily obvious result, that no unbiased estimate of the Bellman residual exists for a single trajectory of observations. We further pick up the recent suggestion of Antos et al. [Learning near-optimal policies with Bellman-residual minimisation based fitted policy iteration and a single sample path, in: COLT, 2006, pp. 574–588] for approximative Bellman residual minimisation and discuss its properties concerning consistency, biasedness, and optimality. We finally give a suggestion to improve the optimality.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 72, Issues 7–9, March 2009, Pages 2005–2008
نویسندگان
,