دانلود رایگان مقاله: معماری یادگیری تقویت کننده و کاربرد آن در کنترل حرکت غول پیکر شناخته شده است

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
8407297	1544988	2014	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control

ترجمه فارسی عنوان

معماری یادگیری تقویت کننده و کاربرد آن در کنترل حرکت غول پیکر شناخته شده است

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Q-learning Acrobot Cognitive bias - تعصب شناختی یا سوگیری شناختی Bio-inspired computing - محاسبات بیرونی الهام گرفته

موضوعات مرتبط

مهندسی و علوم پایه ریاضیات مدل‌سازی و شبیه سازی

پیش نمایش مقاله

معماری یادگیری تقویت کننده و کاربرد آن در کنترل حرکت غول پیکر شناخته شده است

چکیده انگلیسی

Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Biosystems - Volume 116, February 2014, Pages 1-9

نویسندگان

Daisuke Uragami, Tatsuji Takahashi, Yoshiki Matsuo,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : معماری یادگیری تقویت کننده و کاربرد آن در کنترل حرکت غول پیکر شناخته شده است

دسترسی سریع

ارتباط

English Website