Article ID Journal Published Year Pages File Type
4947121 Neurocomputing 2017 26 Pages PDF
Abstract
Extreme learning machine (ELM) has been intensively studied during the last decade due to its high efficiency, effectiveness and easy to implement. Recently, a variant of ELM named local receptive fields based ELM (ELM-LRF) has been proposed to reduce the global connections and introduce local receptive fields to the input layer. However, an ELM-LRF model with large number of hidden neurons spend plenty of time on solving large scale Moore-Penrose Matrix Inversion (MPMI) problem which has heavy computational cost and needs much more runtime memory. Moreover, this procedure can not be directly accelerated by GPU platforms due to the limited memory of GPU devices. In this paper, we propose three efficient approaches to perform ELM-LRF on GPU platform. First we propose a novel blocked LU decomposition algorithm, which overcomes the limitation of global memory size so that any size of ELM-LRF models can be trained. Furthermore, an efficient blocked Cholesky decomposition algorithm is presented to accelerate blocked LU decomposition algorithm according to matrix characteristics in the ELM-LRF model. Finally we present a heterogeneous blocked CPU-GPU parallel algorithm to fully exploit resources on a GPU node such as to accelerate blocked Cholesky decomposition algorithm furthermore in the ELM-LRF model.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,