An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
1142264	957139	2015	6 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Regret analysis Online learning - آموزش آنلاین Matching - تطابق Multi-armed bandit problem - مشکل چند گانه مسلحانه

موضوعات مرتبط

مهندسی و علوم پایه ریاضیات ریاضیات گسسته و ترکیبات

پیش نمایش صفحه اول مقاله

An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem

چکیده انگلیسی

We improved an upper bound on the expected regret of a UCB-type policy LLR for a bandit problem that repeats the following rounds: a player selects a maximal matching on a complete bipartite graph KM,NKM,N and receives a reward for each component edge of the selected matching. Rewards are assumed to be generated independently of its previous rewards according to an unknown fixed distribution. Our upper bound is smaller than the best known result (Chen et al., 2013) by a factor of Θ(M2/3)Θ(M2/3).

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Operations Research Letters - Volume 43, Issue 6, November 2015, Pages 558–563

نویسندگان

Ryo Watanabe, Atsuyoshi Nakamura, Mineichi Kudo,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem

دسترسی سریع

ارتباط

English Website