کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
391495 661845 2015 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Reinforcement Learning endowed with safe veto policies to learn the control of Linked-Multicomponent Robotic Systems
ترجمه فارسی عنوان
تقویت یادگیری با سیاست های حقوقی امن برای یادگیری کنترل سیستم های روبوتیک متصل چندگانه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Performing reinforcement learning-based control of systems whose state space has many Undesired Terminal States (UTS) experiences severe convergence problems. We define UTS as terminal states without associated positive reward information. They appear in the training of over-constrained systems, when breaking a constraint implies that all the effort invested during a learning episode is lost without gathering any constructive information about how to achieve the target task. The random exploration performed by RL algorithms is unfruitful until the system reaches any final state bearing some reward that may be used to update the state-action value functions, hence UTS seriously impede the convergence of the learning process. The most efficient learning strategies avoid reaching any UTS, ensuring that each learning process episode provides useful reward information. Safe Modular State Action Veto (Safe-MSAV) policies learn specifically how to avoid state transitions leading to an UTS. The application of MSAV makes state space exploration much more efficient. Bigger ratio of UTS to the total number of states provide greater improvements. Safe-MSAV uses independent concurrent modules, each dealing with a separate kind of UTS. We report experiments on the control of Linked Multicomponent Robotic Systems (L-MCRS) showing a dramatic decrease on the computational resources required, ensuring faster as well as more accurate results than conventional exploration strategies that do not implement explicit mechanisms to avoid falling in UTS.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 317, 1 October 2015, Pages 25–47
نویسندگان
, , , , ,