کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4947079 1439564 2017 34 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A temporal difference method for multi-objective reinforcement learning
ترجمه فارسی عنوان
یک روش تفاوت زمانی برای یادگیری تقویت چند هدف
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning generalizes directly the ideas of Q-learning to the multi-objective case. It can be applied to non-convex Pareto frontiers and finds both supported and unsupported solutions. We present the results of the application of MPQ-learning to some benchmark problems. The algorithm solves successfully these problems, so showing the feasibility of this approach. We also compare MPQ-learning to a standard linearization procedure that computes only supported solutions and show that in some cases MPQ-learning can be as effective as the scalarization method.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 263, 8 November 2017, Pages 15-25
نویسندگان
, , ,