Article ID Journal Published Year Pages File Type
720765 IFAC Proceedings Volumes 2007 6 Pages PDF
Abstract

The r-stage multi-armed bandit problem is considered in minimax setting on the finite sufficiently large time interval T. A sequential control procedure with a priori specified magnitudes of learning stages and thresholds is offered. The value of the minimax risk close to Tα with α = 2r–1/(2r – 1) is obtained. The applications to information transmission and medical treatments are discussed. Considered approach is especially valuable for systems with parallel processing in which the number of stages r mainly influences the total duration of the process.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics
Authors
, ,