Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
720765 | IFAC Proceedings Volumes | 2007 | 6 Pages |
Abstract
The r-stage multi-armed bandit problem is considered in minimax setting on the finite sufficiently large time interval T. A sequential control procedure with a priori specified magnitudes of learning stages and thresholds is offered. The value of the minimax risk close to Tα with α = 2r–1/(2r – 1) is obtained. The applications to information transmission and medical treatments are discussed. Considered approach is especially valuable for systems with parallel processing in which the number of stages r mainly influences the total duration of the process.
Related Topics
Physical Sciences and Engineering
Engineering
Computational Mechanics
Authors
A.V. Kolnogorov, S.V. Melnikova,