کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4952485 | 1442041 | 2016 | 16 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Adaptive playouts for online learning of policies during Monte Carlo Tree Search
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
Monte Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test cases which are known to pose a problem for Monte Carlo Tree Search.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 644, 6 September 2016, Pages 53-62
Journal: Theoretical Computer Science - Volume 644, 6 September 2016, Pages 53-62
نویسندگان
Tobias Graf, Marco Platzner,