Online regret bounds for Markov decision processes with deterministic transitions

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
435646	689922	2010	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Online regret bounds for Markov decision processes with deterministic transitions

چکیده انگلیسی

We consider an upper confidence bound algorithm for learning in Markov decision processes with deterministic transitions. For this algorithm we derive upper bounds on the online regret with respect to an (ε-)optimal policy that are logarithmic in the number of steps taken. We also present a corresponding lower bound. As an application, multi-armed bandits with switching cost are considered.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Theoretical Computer Science - Volume 411, Issues 29–30, 17 June 2010, Pages 2684-2695

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Online regret bounds for Markov decision processes with deterministic transitions

دسترسی سریع

ارتباط

English Website