Managing the computational cost of model selection and cross-validation in extreme learning machines via Cholesky, SVD, QR and eigen decompositions

Article ID	Journal	Published Year	Pages	File Type
6863988	Neurocomputing	2018	38 Pages	PDF

Abstract

The typical model selection strategy applied in most Extreme Learning Machine (ELM) papers depends on a k-fold cross-validation and a grid search to select the best pair {L, C} of two adjustable hyper-parameters, namely the number L of hidden ELM nodes and the regularisation parameter, C, that minimizes the validation error. However, by testing only 30 values for L and 30 values for C via 10-fold cross-validation, the learning phase must build 9000 different ELM models, each with a different pair {L, C}. Since these models are not independent from one another, the essence of managing and drastically reducing the computational cost of the ELM model selection relies on matrix decompositions that avoid direct matrix inversion and allow producing reusable matrices during the cross-validations. Still, one can find many matrix decompositions and cross-validation versions that result in several combinations. In this paper, we identify these combinations and analyse them theoretically and experimentally to discover which is the fastest. We compare Singular Value Decomposition (SVD), Eigenvalue Decomposition (EVD), Cholesky decomposition, and QR decomposition, which produce re-usable matrices (orthogonal, Eigen, singular, and upper triangular). These decompositions can be combined with different cross-validation approaches, and we present a direct and thorough comparison of many k-fold cross-validation versions as well as leave-one-out cross-validation. By analysing the computational cost, we demonstrate theoretically and experimentally that while the type of matrix decomposition plays one important role, another equally important role is played by the version of cross-validation. Finally, a scalable and computationally-effective algorithm is presented that significantly reduces computational time.

Keywords

Cross-validation Model selection Matrix decompositions Extreme learning machines