Linear Bayes policy for learning in contextual-bandits

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
383591	660827	2013	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Empirical Bayes - Bayes تجربی Online advertising - تبلیغات آنلاین Recommender systems - سامانه توصیه‌گر

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

Linear Bayes policy for learning in contextual-bandits

چکیده انگلیسی

• A new selection rule for Contextual Bandits and single-step Reinforcement Learning.
• The use of empirical approximations to the Bayes’rule is an effective approach.
• This technique arrived second at new Challenges for Exploration & Exploitation 3.

Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 18, 15 December 2013, Pages 7400–7406

نویسندگان

José Antonio Martín H., Ana M. Vargas,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Linear Bayes policy for learning in contextual-bandits

دسترسی سریع

ارتباط

English Website