دانلود رایگان مقاله: یک روش تفاوت زمانی برای یادگیری تقویت چند هدف

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4947079	1439564	2017	34 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A temporal difference method for multi-objective reinforcement learning

ترجمه فارسی عنوان

یک روش تفاوت زمانی برای یادگیری تقویت چند هدف

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Q-learning Multi-objective optimization - بهینه سازی چند هدفه Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

یک روش تفاوت زمانی برای یادگیری تقویت چند هدف

چکیده انگلیسی

This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning generalizes directly the ideas of Q-learning to the multi-objective case. It can be applied to non-convex Pareto frontiers and finds both supported and unsupported solutions. We present the results of the application of MPQ-learning to some benchmark problems. The algorithm solves successfully these problems, so showing the feasibility of this approach. We also compare MPQ-learning to a standard linearization procedure that computes only supported solutions and show that in some cases MPQ-learning can be as effective as the scalarization method.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 263, 8 November 2017, Pages 15-25

نویسندگان

Manuela Ruiz-Montiel, Lawrence Mandow, José-Luis Pérez-de-la-Cruz,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک روش تفاوت زمانی برای یادگیری تقویت چند هدف

دسترسی سریع

ارتباط

English Website