Feature selection with effective distance

Article ID	Journal	Published Year	Pages	File Type
4948193	Neurocomputing	2016	25 Pages	PDF

Abstract

As more features are introduced in pattern recognition and machine learning applications, feature selection remains a critically important task to find the most compact representation of data, especially in unsupervised learning scenarios without enough class labels. Although there are a number of unsupervised feature selection methods in the literature, most of existing methods only focus on using conventional distances (e.g., Euclidean distance) to measure the similarity between two samples, which could not capture the dynamic structure of data due to the static characteristics of conventional distances. To reflect the dynamic structure of data, in this paper, we propose a set of effective distance-based feature selection methods, where a probabilistically motivated effective distance is used to measure the similarity of samples. Specifically, we first develop a sparse representation-based algorithm to compute the effective distance. Then, we propose three new filter-type unsupervised feature selection methods using effective distance, including an effective distance-based Laplacian Score (EDLS), and two effective distance-based Sparsity Scores (i.e., EDSS-1, and EDSS-2). Experimental results of clustering and classification tasks on a series of benchmark data sets show that our effective distance-based feature selection methods can achieve better performance than conventional methods using Euclidean distance.

Keywords

Feature selection Clustering Classification Effective distance Sparse representation