Real-time reinforcement learning by sequential Actor

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
404548	677434	2009	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Real-time reinforcement learning by sequential Actor–Critics and experience replay

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Direct adaptive control - کنترل تطبیقی مستقیم Reinforcement learning - یادگیری تقویتی Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Real-time reinforcement learning by sequential Actor–Critics and experience replay

چکیده انگلیسی

Actor–Critics constitute an important class of reinforcement learning algorithms that can deal with continuous actions and states in an easy and natural way. This paper shows how these algorithms can be augmented by the technique of experience replay without degrading their convergence properties, by appropriately estimating the policy change direction. This is achieved by truncated importance sampling applied to the recorded past experiences. It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. The technique of experience replay makes it possible to utilize the available computational power to reduce the required number of interactions with the environment considerably, which is essential for real-world applications. Experimental results are presented that demonstrate that the combination of experience replay and Actor–Critics yields extremely fast learning algorithms that achieve successful policies for non-trivial control tasks in considerably short time. Namely, the policies for the cart-pole swing-up [Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1), 219–245] are obtained after as little as 20 min of the cart-pole time and the policy for Half-Cheetah (a walking 6-degree-of-freedom robot) is obtained after four hours of Half-Cheetah time.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 22, Issue 10, December 2009, Pages 1484–1497

نویسندگان

Paweł Wawrzyński,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Real-time reinforcement learning by sequential Actor–Critics and experience replay

دسترسی سریع

ارتباط

English Website