دانلود رایگان مقاله: یادگیری تفاوت های زمانی کمتر از مربع بر اساس دستگاه یادگیری افراطی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
406608	678101	2014	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Least-squares temporal difference learning based on an extreme learning machine

ترجمه فارسی عنوان

یادگیری تفاوت های زمانی کمتر از مربع بر اساس دستگاه یادگیری افراطی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

یادگیری اختلاف زمانهای کوچکترین مربع دستگاه یادگیری شدید تقویت یادگیری

Extreme learning machine - دستگاه یادگیری شدید Reinforcement learning - یادگیری تقویتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

یادگیری تفاوت های زمانی کمتر از مربع بر اساس دستگاه یادگیری افراطی

چکیده انگلیسی

Reinforcement learning (RL) is a general class of algorithms for solving decision-making problems, which are usually modeled using the Markov decision process (MDP) framework. RL can find exact solutions only when the MDP state space is discrete and small enough. Due to the fact that many real-world problems are described by continuous variables, approximation is essential in practical applications of RL. This paper is focused on learning the value function of a fixed policy in continuous MPDs. This is an important subproblem of several RL algorithms. We propose a least-squares temporal difference (LSTD) algorithm based on the extreme learning machine. LSTD is typically combined with local function approximators, which scale poorly with the problem dimensionality. Our approach allows us to approximate value functions using single-hidden layer feedforward networks (SLFNs), a type of artificial neural network extensively used in many fields. Due to the global nature of SLFNs, the proposed approach is more suitable than traditional methods for high-dimensional problems. The method was empirically evaluated on a set of MDPs whose dimensionality varies from 1 to 6. For comparison purposes, experiments were replicated using a standard LSTD algorithm combined with Gaussian radial basis functions. Experimental results suggest that, although both methods can approximate accurately value functions, the proposed approach requires considerably fewer resources for the same degree of accuracy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 141, 2 October 2014, Pages 37–45

نویسندگان

Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یادگیری تفاوت های زمانی کمتر از مربع بر اساس دستگاه یادگیری افراطی

دسترسی سریع

ارتباط

English Website