Analysis and improvement of policy gradient estimation

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
406759	678108	2012	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Variance reduction - کاهش واریانس Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Analysis and improvement of policy gradient estimation

چکیده انگلیسی

Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates. Finally, we demonstrate the usefulness of the improved PGPE method through experiments.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 26, February 2012, Pages 118–129

نویسندگان

Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Analysis and improvement of policy gradient estimation

دسترسی سریع

ارتباط

English Website