Selection of non-zero loadings in sparse principal component analysis

Article ID	Journal	Published Year	Pages	File Type
5132259	Chemometrics and Intelligent Laboratory Systems	2017	12 Pages	PDF

Abstract

Principal component analysis (PCA) is a widely accepted procedure for summarizing data through dimensional reduction. In PCA, the selection of the appropriate number of components and the interpretation of those components have been the key challenging features. Sparse principal component analysis (SPCA) is a relatively recent technique proposed for producing principal components with sparse loadings via the variance-sparsity trade-off. Although several techniques for deriving sparse loadings have been offered, no detailed guidelines for choosing the penalty parameters to obtain a desired level of sparsity are provided. In this paper, we propose the use of a genetic algorithm (GA) to select the number of non-zero loadings (NNZL) in each principal component while using SPCA. The proposed approach considerably improves the interpretability of principal components and addresses the difficulty in the selection of NNZL in SPCA. Furthermore, we compare the performance of PCA and SPCA in uncovering the underlying latent structure of the data. The key features of the methodology are assessed through a synthetic example, pitprops data and a comparative study of the benchmark Tennessee Eastman process.

Keywords

Principal component analysis (PCA)Genetic algorithm Tennessee Eastman process