Selecting feature subset with sparsity and low redundancy for unsupervised learning

Article ID	Journal	Published Year	Pages	File Type
6862438	Knowledge-Based Systems	2015	14 Pages	PDF

Abstract

Feature selection techniques are attracting more and more attention with the growing number of domains that produce high dimensional data. Due to the absence of class labels, many researchers focus on the unsupervised scenario, attempting to find an optimal feature subset that preserves the original data distribution. However, the existing methods either fail to achieve sparsity or ignore the potential redundancy among features. In this paper, we propose a novel unsupervised feature selection algorithm, which retains the preserving power, and implements high sparsity and low redundancy in a unified manner. On the one hand, to preserve the data structure of the whole feature set, we build the graph Laplacian matrix and learn the pseudo class labels through spectral analysis. By finding a feature weight matrix, we are allowed to map the original data into a low dimensional space based on the pseudo labels. On the other hand, to ensure the sparsity and low redundancy simultaneously, we introduce a novel regularization term into the objective function with the nonnegative constraints imposed, which can be viewed as the combination of the matrix norms ||Â·||m1 and ||Â·||m2 on the weights of features. An iterative multiplicative algorithm is accordingly designed with proved convergence to efficiently solve the constrained optimization problem. Extensive experimental results on different real world data sets demonstrate the promising performance of our proposed method over the state-of-the-arts.

Keywords

Unsupervised feature selection