دانلود رایگان مقاله: آزمایشات یادگیری تقویت کننده شرطی در وظایف کنترل فضای مداوم

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4946817	1439556	2018	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Experiments of conditioned reinforcement learning in continuous space control tasks

ترجمه فارسی عنوان

آزمایشات یادگیری تقویت کننده شرطی در وظایف کنترل فضای مداوم

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

تقویت یادگیری؛ روش Actor-Critic؛ یادگیری متعارف؛ کنترل توربین های بادی

Reinforcement learning - یادگیری تقویتی Conditioned learning - یادگیری متعادل

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

آزمایشات یادگیری تقویت کننده شرطی در وظایف کنترل فضای مداوم

چکیده انگلیسی

The key issue that prevents application of Reinforcement Learning (RL) methods in complex control scenarios is lack of convergence to meaningful decision policies (i.e. policies that differ significatively from random decisions), due to the huge state-action spaces to be explored. Providing the agent with initial domain knowledge alleviates this problem. This is known as Conditioned RL (CRL). In high-dimensional continuous state-action space and reward domains, CRL is often the only feasible approach to reach meaningful decision policies. In these kind of systems, RL is carried out by Actor-Critic approaches, and the state-action value functionals are modeled by Value Function Approximations (VFA). CRL methods make use of an existing reference controller, i.e. the teacher controller, which provides the initial domain knowledge to the agent under training. The teacher-controller can be used in two ways to build the VFA of the state-action value and state transition functions which determine the action selection policy: (1) providing the desired output for a supervised learning process, or (2) directly using it to build them. We have carried out experiments to compare CRL methods, and unconditioned Actor-Critic agents in three different control benchmark scenarios. Results show that both agent conditioning approaches result in significant performance improvements. Undertight computational time constraints, CRL approaches were able to learn efficient policies, while the unconditioned agents were not able to find any acceptable policy in the benchmark control scenarios.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 271, 3 January 2018, Pages 38-47

نویسندگان

Borja Fernandez-Gauna, Juan Luis Osa, Manuel Graña,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : آزمایشات یادگیری تقویت کننده شرطی در وظایف کنترل فضای مداوم

دسترسی سریع

ارتباط

English Website