Adaptive playouts for online learning of policies during Monte Carlo Tree Search

Article ID	Journal	Published Year	Pages	File Type
4952485	Theoretical Computer Science	2016	16 Pages	PDF

Abstract

Monte Carlo Tree Search evaluates positions with the help of a playout policy. If the playout policy evaluates a position wrong then there are cases where the tree search has difficulties to find the correct move due to the large search space. This paper explores adaptive playout policies which improve the playout policy during a tree search. With the help of policy gradient reinforcement learning techniques we optimize the playout policy to give better evaluations. We tested the algorithm in Computer Go and measured an increase in playing strength of more than 100 ELO. The resulting program was able to deal with difficult test cases which are known to pose a problem for Monte Carlo Tree Search.

Keywords

Computer Go Monte Carlo tree search Reinforcement learning