کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4952485 1442041 2016 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Adaptive playouts for online learning of policies during Monte Carlo Tree Search
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Adaptive playouts for online learning of policies during Monte Carlo Tree Search
چکیده انگلیسی
Monte Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test cases which are known to pose a problem for Monte Carlo Tree Search.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 644, 6 September 2016, Pages 53-62
نویسندگان
, ,