Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4952485 | Theoretical Computer Science | 2016 | 16 Pages |
Abstract
Monte Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test cases which are known to pose a problem for Monte Carlo Tree Search.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Tobias Graf, Marco Platzner,