دانلود رایگان مقاله: رنج انتقام برای راهزنان مارکوف بی قرار

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
434060	689675	2014	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Regret bounds for restless Markov bandits

ترجمه فارسی عنوان

رنج انتقام برای راهزنان مارکوف بی قرار

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

راهزنان بی قرار، فرایندهای تصمیم گیری مارکوف، پشیمان بودن

Markov decision processes - پروسه تصمیم گیری مارکوف regret - پشیمان بودن

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش مقاله

چکیده انگلیسی

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning.Applied to the restless bandit setting, this algorithm achieves after any T steps regret of order O˜(T) with respect to the best policy that knows the distributions of all arms. We make no assumptions on the Markov chains underlying each arm except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 558, 13 November 2014, Pages 62–76

نویسندگان

Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : رنج انتقام برای راهزنان مارکوف بی قرار

دسترسی سریع

ارتباط

English Website