Error bounds for constant step-size QQ-learning

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
756363	896152	2012	6 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Stochastic approximation - تقریبی تصادفی Markov decision processes - پروسه تصمیم گیری مارکوف

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی

پیش نمایش صفحه اول مقاله

Error bounds for constant step-size QQ-learning

چکیده انگلیسی

We provide a bound on the first moment of the error in the QQ-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the QQ-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the QQ-learning error, i.e., the difference between the true QQ-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the QQ-learning algorithm can be derived.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Systems & Control Letters - Volume 61, Issue 12, December 2012, Pages 1203–1208

نویسندگان

C.L. Beck, R. Srikant,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Error bounds for constant step-size QQ-learning

دسترسی سریع

ارتباط

English Website