کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
406608 678101 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Least-squares temporal difference learning based on an extreme learning machine
ترجمه فارسی عنوان
یادگیری تفاوت های زمانی کمتر از مربع بر اساس دستگاه یادگیری افراطی
کلمات کلیدی
یادگیری اختلاف زمانهای کوچکترین مربع دستگاه یادگیری شدید تقویت یادگیری
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Reinforcement learning (RL) is a general class of algorithms for solving decision-making problems, which are usually modeled using the Markov decision process (MDP) framework. RL can find exact solutions only when the MDP state space is discrete and small enough. Due to the fact that many real-world problems are described by continuous variables, approximation is essential in practical applications of RL. This paper is focused on learning the value function of a fixed policy in continuous MPDs. This is an important subproblem of several RL algorithms. We propose a least-squares temporal difference (LSTD) algorithm based on the extreme learning machine. LSTD is typically combined with local function approximators, which scale poorly with the problem dimensionality. Our approach allows us to approximate value functions using single-hidden layer feedforward networks (SLFNs), a type of artificial neural network extensively used in many fields. Due to the global nature of SLFNs, the proposed approach is more suitable than traditional methods for high-dimensional problems. The method was empirically evaluated on a set of MDPs whose dimensionality varies from 1 to 6. For comparison purposes, experiments were replicated using a standard LSTD algorithm combined with Gaussian radial basis functions. Experimental results suggest that, although both methods can approximate accurately value functions, the proposed approach requires considerably fewer resources for the same degree of accuracy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 141, 2 October 2014, Pages 37–45
نویسندگان
, , , , ,