Article ID Journal Published Year Pages File Type
713019 IFAC Proceedings Volumes 2013 6 Pages PDF
Abstract

We consider the two-armed bandit problem in the following robust (minimax) setting. Distributions of rewards corresponding to the first arm have known finite mathematical expectation. Distributions of rewards corresponding to the second arm are normal ones with unknown mathematical expectation and unit variance. According to the main theorem of the theory of games minimax strategy and minimax risk are searched as Bayes ones corresponding to the worst prior distribution. In considered case, the worst prior distribution is concentrated in two points. This allows one to use numerical optimization. Results are applied to systems with parallel data processing including distributions other than normal.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics