کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1142264 957139 2015 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات ریاضیات گسسته و ترکیبات
پیش نمایش صفحه اول مقاله
An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
چکیده انگلیسی

We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph KM,NKM,N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previous rewards according to an unknown fixed distribution. Our upper bound is smaller than the best known result (Chen et al., 2013) by a factor of Θ(M2/3)Θ(M2/3).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Operations Research Letters - Volume 43, Issue 6, November 2015, Pages 558–563
نویسندگان
, , ,