Article ID Journal Published Year Pages File Type
535289 Pattern Recognition Letters 2015 8 Pages PDF
Abstract

•Feature selection method for unsupervised learning inspired by human learning.•Detected features supporting complex structure not limited to clusters.•Automatic parameter estimation alleviating the burden of manually tuning parameters.•A scheme to assess the statistical significance of discovered data patterns.

We consider the problem of feature selection for unsupervised learning and develop a new algorithm capable of identifying informative features supporting complex structures embedded in a high-dimensional space. The development of the algorithm is inspired by human learning in detecting complex data structures. We formulate it as an optimization problem with a well-defined objective function, and solve the problem by using an iterative approach. The algorithm can be easily implemented and is computationally very efficient. We use gap statistics to estimate the parameters so that the proposed method is completely parameter-free. We also develop a scheme based on permutation tests to estimate the statistical significance of the presence of a data structure. We demonstrate the effectiveness and versatility of the algorithm by comparing it with seven existing methods on a set of synthetic datasets with a wide variety of structures and cancer microarray gene expression datasets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,