کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6863650 1439517 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Possibilistic reward methods for the multi-armed bandit problem
ترجمه فارسی عنوان
روش پاداش های مثبت برای مشکل راهزنی چند مسلح
کلمات کلیدی
مشکل چند باند مسلح، پاداش مثبت، مطالعه عددی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
In this paper, we propose a set of allocation strategies to deal with the multi-armed bandit problem, the possibilistic reward (PR) methods. First, we use possibilistic reward distributions to model the uncertainty about the expected rewards from the arm, derived from a set of infinite confidence intervals nested around the expected value. Depending on the inequality used to compute the confidence intervals, there are three possible PR methods with different features. Next, we use a pignistic probability transformation to convert these possibilistic functions into probability distributions following the insufficient reason principle. Finally, Thompson sampling techniques are used to identify the arm with the higher expected reward and play that arm. A numerical study analyses the performance of the proposed methods with respect to other policies in the literature. Two PR methods perform well in all representative scenarios under consideration, and are the best allocation strategies if truncated poisson or exponential distributions in [0,10] are considered for the arms.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 310, 8 October 2018, Pages 201-212
نویسندگان
, , ,