New algorithms of the Q-learning type

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
697381	890367	2008	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Q-learning SPSA Markov decision processes - پروسه تصمیم گیری مارکوف Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه سایر رشته های مهندسی کنترل و سیستم های مهندسی

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automatica - Volume 44, Issue 4, April 2008, Pages 1111–1119

نویسندگان

Shalabh Bhatnagar, K. Mohan Babu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

New algorithms of the Q-learning type

دسترسی سریع

ارتباط

English Website