دانلود رایگان مقاله: انحراف سیاست تحت تحریم پاداش برای یادگیری تقویت چند هدف

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4947083	1439564	2017	42 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Policy invariance under reward transformations for multi-objective reinforcement learning

ترجمه فارسی عنوان

انحراف سیاست تحت تحریم پاداش برای یادگیری تقویت چند هدف

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

تقویت یادگیری، چند هدفه، مبتنی بر پتانسیل، شکل دادن به پاداش، سیستم های چندگانه،

Multi-Agent Systems - سیستم چندعاملی Multi-objective - چند هدفه Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

انحراف سیاست تحت تحریم پاداش برای یادگیری تقویت چند هدف

چکیده انگلیسی

Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent's exploration. Potential-Based Reward Shaping (PBRS) is a specific form of reward shaping that offers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems. Specifically, we provide theoretical proof that PBRS does not alter the true Pareto front in both single- and multi-agent MORL. We also contribute the first published empirical studies of the effect of PBRS in single- and multi-agent MORL problems.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 263, 8 November 2017, Pages 60-73

نویسندگان

Patrick Mannion, Sam Devlin, Karl Mason, Jim Duggan, Enda Howley,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : انحراف سیاست تحت تحریم پاداش برای یادگیری تقویت چند هدف

دسترسی سریع

ارتباط

English Website