Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
435830	689942	2008	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

چکیده انگلیسی

The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t, only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decision. It is known that an adversary controlling the costs of the decisions can force the player a regret growing as in the time t. In this work, we propose the first algorithm for a countably infinite set of decisions, that achieves a regret upper bounded by , i.e. arbitrarily close to optimal order. To this aim, we build on the “follow the perturbed leader” principle, which dates back to work by Hannan in 1957. Our results hold against an adaptive adversary, for both the expected and high probability regret of the learner w.r.t. each decision. In the second part of the paper, we consider reactive problem settings, that is, situations where the learner’s decisions impact on the future behaviour of the adversary, and a strong strategy can draw benefit from well chosen past actions. We present a variant of our regret minimization algorithm which has still regret of order at most relative to such strong strategies, and even sublinear regret not exceeding w.r.t. the hypothetical (without external interference) performance of a strong strategy. We show how to combine the regret minimizer with a universal class of experts, given by the countable set of programs on some fixed universal Turing machine. This defines a universal learner with sublinear regret relative to any computable strategy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 397, Issues 1–3, 20 May 2008, Pages 77-93

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments

دسترسی سریع

ارتباط

English Website