Article ID Journal Published Year Pages File Type
6863650 Neurocomputing 2018 12 Pages PDF
Abstract
In this paper, we propose a set of allocation strategies to deal with the multi-armed bandit problem, the possibilistic reward (PR) methods. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arm, derived from a set of infinite confidence intervals nested around the expected value. Depending on the inequality used to compute the confidence intervals, there are three possible PR methods with different features. Next, we use a pignistic probability transformation to convert these possibilistic functions into probability distributions following the insufficient reason principle. Finally, Thompson sampling techniques are used to identify the arm with the higher expected reward and play that arm. A numerical study analyses the performance of the proposed methods with respect to other policies in the literature. Two PR methods perform well in all representative scenarios under consideration, and are the best allocation strategies if truncated poisson or exponential distributions in [0,10] are considered for the arms.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,