کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
731934 893188 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی
پیش نمایش صفحه اول مقاله
Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy
چکیده انگلیسی

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune. In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Mechatronics - Volume 24, Issue 8, December 2014, Pages 966–974
نویسندگان
, , ,