کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
407264 678134 2010 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Online learning of shaping rewards in reinforcement learning
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Online learning of shaping rewards in reinforcement learning
چکیده انگلیسی

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 23, Issue 4, May 2010, Pages 541–550
نویسندگان
, ,