کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
485306 703324 2013 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma
چکیده انگلیسی

The exploration–exploitation dilemma is an attractive theme in reinforcement learning. Under the tradeoff framework, a reinforcement learning agent must cleverly switch between exploration and exploitation because an action, which is estimated as the best in the current learning state, may not actually be the true best. We demonstrate that an agent can determine the best action under certain conditions even if the agent selects the exploitation phase. Under the conditions, the agent does not need an explicit exploration phase, thereby resolving the exploration–exploitation dilemma. We also propose a value function on actions and how to update this value function. The proposed method, the “overtaking method,” can be integrated with existing methods, UCB1 and UCB1-tuned, for the multi-armed bandit problem without compromising features. The integrated models show better results than the original models.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 24, 2013, Pages 126-136