Article ID Journal Published Year Pages File Type
4948193 Neurocomputing 2016 25 Pages PDF
Abstract
As more features are introduced in pattern recognition and machine learning applications, feature selection remains a critically important task to find the most compact representation of data, especially in unsupervised learning scenarios without enough class labels. Although there are a number of unsupervised feature selection methods in the literature, most of existing methods only focus on using conventional distances (e.g., Euclidean distance) to measure the similarity between two samples, which could not capture the dynamic structure of data due to the static characteristics of conventional distances. To reflect the dynamic structure of data, in this paper, we propose a set of effective distance-based feature selection methods, where a probabilistically motivated effective distance is used to measure the similarity of samples. Specifically, we first develop a sparse representation-based algorithm to compute the effective distance. Then, we propose three new filter-type unsupervised feature selection methods using effective distance, including an effective distance-based Laplacian Score (EDLS), and two effective distance-based Sparsity Scores (i.e., EDSS-1, and EDSS-2). Experimental results of clustering and classification tasks on a series of benchmark data sets show that our effective distance-based feature selection methods can achieve better performance than conventional methods using Euclidean distance.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,