Article ID Journal Published Year Pages File Type
719649 IFAC Proceedings Volumes 2010 8 Pages PDF
Abstract

Regarding the fact that model-based reinforcement learning has a superior performance over traditional RL, in this paper, we extend traditional model-based reinforcement learning for a group of self-interested agents with consecutive action selection trying to find the optimal policy. Every single decision making situation is modeled as extensive form games with perfect information. A modified version of prioritized sweeping is proposed in which subgame perfect equilibrium point is the optimal joint action. Finally, we discuss the algorithm analytically, and provide a formal convergence proof.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics