دانلود رایگان مقاله: نوسان در سیاست، بیش از حد است

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
405459	677641	2014	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Policy oscillation is overshooting

ترجمه فارسی عنوان

نوسان در سیاست، بیش از حد است

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

تقویت یادگیری، برنامه ریزی پویا تقریبی گرادیان سیاست، گرادیان طبیعی نوسانات سیاست، چت روم سیاسی

Approximate dynamic programming - تقریبا برنامه نویسی پویا Policy gradient - شیب خط مشی Natural gradient - شیب طبیعی Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

چکیده انگلیسی

A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting, within the context of non-optimistic policy iteration, a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. Based on empirical findings, we offer a hypothesis that might explain the inferior performance levels and the associated policy degradation phenomenon, and which would partially support the suggested connection. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results by an order of magnitude.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 52, April 2014, Pages 43–61

نویسندگان

Paul Wagner,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : نوسان در سیاست، بیش از حد است

دسترسی سریع

ارتباط

English Website