Near-optimal no-regret algorithms for zero-sum games

Article ID	Journal	Published Year	Pages	File Type
5071599	Games and Economic Behavior	2015	22 Pages	PDF

Abstract

We propose a new no-regret learning algorithm. When used against an adversary, our algorithm achieves average regret that scales optimally as O(1T) with the number T of rounds. However, when our algorithm is used by both players of a zero-sum game, their average regret scales as O(lnâ¡TT), guaranteeing a near-linear rate of convergence to the value of the game. This represents an almost-quadratic improvement on the rate of convergence to the value of a zero-sum game known to be achievable by any no-regret learning algorithm. Moreover, it is essentially optimal as we also show a lower bound of Î©(1T) for all distributed dynamics, as long as the players do not know their payoff matrices in the beginning of the dynamics. (If they did, they could privately compute minimax strategies and play them ad infinitum.)

Keywords

C72 Zero-sum games C73 Repeated games Learning