Overtaking Method based on Variance of Values: Resolving the Exploration

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
485306	703324	2013	11 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

چکیده انگلیسی

The exploration–exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration–exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the “overtaking method,” can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 24, 2013, Pages 126-136

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma

دسترسی سریع

ارتباط

English Website