Adaptive playouts for online learning of policies during Monte Carlo Tree Search

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4952485	1442041	2016	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Computer Go Monte Carlo tree search - جستجو درخت مونت کارلو Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Adaptive playouts for online learning of policies during Monte Carlo Tree Search

چکیده انگلیسی

Monte Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test cases which are known to pose a problem for Monte Carlo Tree Search.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 644, 6 September 2016, Pages 53-62

نویسندگان

Tobias Graf, Marco Platzner,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Adaptive playouts for online learning of policies during Monte Carlo Tree Search

دسترسی سریع

ارتباط

English Website