کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4947080 1439564 2017 37 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Steering approaches to Pareto-optimal multiobjective reinforcement learning
ترجمه فارسی عنوان
رویکرد رهبری به یادگیری تقویت چند هدفه بهینه سازی پارتو
کلمات کلیدی
یادگیری تقویت چند هدفه، سیاست های غیر ثابت، فرمان هندسی، آموزش تقویت تعاملی، بهینه سازی پارتو،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent's target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 263, 8 November 2017, Pages 26-38
نویسندگان
, , , , , , ,