Efficient exploration through active learning for value function approximation in reinforcement learning

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
405585	677681	2010	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Markov decision process - روند تصمیم گیری مارکوف Reinforcement learning - یادگیری تقویتی Active learning - یادگیری فعال

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Efficient exploration through active learning for value function approximation in reinforcement learning

چکیده انگلیسی

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neural Networks - Volume 23, Issue 5, June 2010, Pages 639–648

نویسندگان

Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiyama,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Efficient exploration through active learning for value function approximation in reinforcement learning

دسترسی سریع

ارتباط

English Website