MINIMAX R-STAGE STRATEGY FOR THE MULTI-ARMED BANDIT PROBLEM

Article ID	Journal	Published Year	Pages	File Type
720765	IFAC Proceedings Volumes	2007	6 Pages	PDF

Abstract

The r-stage multi-armed bandit problem is considered in minimax setting on the finite sufficiently large time interval T. A sequential control procedure with a priori specified magnitudes of learning stages and thresholds is offered. The value of the minimax risk close to Tα with α = 2r–1/(2r – 1) is obtained. The applications to information transmission and medical treatments are discussed. Considered approach is especially valuable for systems with parallel processing in which the number of stages r mainly influences the total duration of the process.

Keywords

Sequential decisions multi-armed bandit Parallel processing Minimax control