Backward Q-learning: The combination of Sarsa algorithm and Q-learning

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
380759	1437454	2013	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Q-learning Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

چکیده انگلیسی

• We have presented a new method, called the backward Q-learning, which is successive to combine with the Sarsa algorithm and the Q-learning.
• The proposed backward Q-learning can not only enhance learning speed but also improve the final performance.
• The backward Q-learning can be easily combined with other RL algorithms.
• Simulations demonstrate that the backward Q-learning based RL algorithms can indeed enhance learning speed and improve action quality and total performance.

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 26, Issue 9, October 2013, Pages 2184–2193

نویسندگان

Yin-Hao Wang, Tzuu-Hseng S. Li, Chih-Jui Lin,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

دسترسی سریع

ارتباط

English Website