Robust Normal Two-Armed Bandit, One Arm Known, and Parallel Data Processing

Article ID	Journal	Published Year	Pages	File Type
713019	IFAC Proceedings Volumes	2013	6 Pages	PDF

Abstract

We consider the two-armed bandit problem in the following robust (minimax) setting. Distributions of rewards corresponding to the first arm have known finite mathematical expectation. Distributions of rewards corresponding to the second arm are normal ones with unknown mathematical expectation and unit variance. According to the main theorem of the theory of games minimax strategy and minimax risk are searched as Bayes ones corresponding to the worst prior distribution. In considered case, the worst prior distribution is concentrated in two points. This allows one to use numerical optimization. Results are applied to systems with parallel data processing including distributions other than normal.