Sparse Generalised Principal Component Analysis

Article ID	Journal	Published Year	Pages	File Type
6938749	Pattern Recognition	2018	29 Pages	PDF

Abstract

In this paper, we develop a sparse method for unsupervised dimension reduction for data from an exponential-family distribution. Our idea extends previous work on Generalised Principal Component Analysis by adding L1 and SCAD penalties to introduce sparsity. We demonstrate the significance and advantages of our method with synthetic and real data examples. We focus on the application to text data which is high-dimensional and non-Gaussian by nature and discuss the potential advantages of our methodology in achieving dimension reduction.

Keywords

62J07 62H25 62-09 68T50 PCA Exponential family Text mining Dimension reduction