The K-armed dueling bandits problem

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
430291	687959	2012	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Online learning - آموزش آنلاین Preference elicitation - ترجیح دادن Multi-armed bandits - راهزنان چندگانه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that (noisy) binary feedback about the relative reward of two chosen strategies is available. This type of relative feedback is particularly appropriate in applications where absolute rewards have no natural scale or are difficult to measure (e.g., user-perceived quality of a set of retrieval results, taste of food, product attractiveness), but where pairwise comparisons are easy to make. We propose a novel regret formulation in this setting, as well as present an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor).

► We study a bandit setting where the only feedback is the result of comparing pairs of arms.
► We formulate a regret minimization objective for this setting.
► We develop an algorithm for this setting and prove that it is optimal (up to constant factors).

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Computer and System Sciences - Volume 78, Issue 5, September 2012, Pages 1538–1556

نویسندگان

Yisong Yue, Josef Broder, Robert Kleinberg, Thorsten Joachims,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

The K-armed dueling bandits problem

دسترسی سریع

ارتباط

English Website