کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4946817 1439556 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Experiments of conditioned reinforcement learning in continuous space control tasks
ترجمه فارسی عنوان
آزمایشات یادگیری تقویت کننده شرطی در وظایف کنترل فضای مداوم
کلمات کلیدی
تقویت یادگیری؛ روش Actor-Critic؛ یادگیری متعارف؛ کنترل توربین های بادی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

The key issue that prevents application of Reinforcement Learning (RL) methods in complex control scenarios is lack of convergence to meaningful decision policies (i.e. policies that differ significatively from random decisions), due to the huge state-action spaces to be explored. Providing the agent with initial domain knowledge alleviates this problem. This is known as Conditioned RL (CRL). In high-dimensional continuous state-action space and reward domains, CRL is often the only feasible approach to reach meaningful decision policies. In these kind of systems, RL is carried out by Actor-Critic approaches, and the state-action value functionals are modeled by Value Function Approximations (VFA). CRL methods make use of an existing reference controller, i.e. the teacher controller, which provides the initial domain knowledge to the agent under training. The teacher-controller can be used in two ways to build the VFA of the state-action value and state transition functions which determine the action selection policy: (1) providing the desired output for a supervised learning process, or (2) directly using it to build them. We have carried out experiments to compare CRL methods, and unconditioned Actor-Critic agents in three different control benchmark scenarios. Results show that both agent conditioning approaches result in significant performance improvements. Undertight computational time constraints, CRL approaches were able to learn efficient policies, while the unconditioned agents were not able to find any acceptable policy in the benchmark control scenarios.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 271, 3 January 2018, Pages 38-47
نویسندگان
, , ,