کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
404548 677434 2009 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Real-time reinforcement learning by sequential Actor–Critics and experience replay
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Real-time reinforcement learning by sequential Actor–Critics and experience replay
چکیده انگلیسی

Actor–Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. This paper shows how these algorithms can be augmented by the technique of experience replay without degrading their convergence properties, by appropriately estimating the policy change direction. This is achieved by truncated importance sampling applied to the recorded past experiences. It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. The technique of experience replay makes it possible to utilize the available computational power to reduce the required number of interactions with the environment considerably, which is essential for real-world applications. Experimental results are presented that demonstrate that the combination of experience replay and Actor–Critics yields extremely fast learning algorithms that achieve successful policies for non-trivial control tasks in considerably short time. Namely, the policies for the cart-pole swing-up [Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245] are obtained after as little as 20 min of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot) is obtained after four hours of Half-Cheetah time.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 22, Issue 10, December 2009, Pages 1484–1497
نویسندگان
,