An evaluation of classifier-specific filter measure performance for feature selection

Article ID	Journal	Published Year	Pages	File Type
529933	Pattern Recognition	2015	15 Pages	PDF

Abstract

•Compare common feature selection filter measures for use with specific classifiers.•Many tested filter measures do not reliably predict classifier accuracy.•Some measures have specific problems that cause them to select unsuitable features.•Best feature selection filter measure is classifier specific.

Feature selection is an important part of classifier design. There are many possible methods for searching and evaluating feature subsets, but little consensus on which methods are best. This paper examines a number of filter-based feature subset evaluation measures with the goal of assessing their performance with respect to specific classifiers.This work tests 16 common filter measures for use with K-nearest neighbors and support vector machine classifiers. The measures are tested on 20 real and 20 artificial data sets, which are designed to probe for specific challenges. The strengths and weaknesses of each measure are discussed with respect to the specific challenges and correlation with classifier accuracy. The results highlight several challenging problems with a number of common filter measures.The results indicate that the best filter measure is classifier-specific. K-nearest neighbors classifiers work well with subset-based RELIEF, correlation feature selection or conditional mutual information maximization, whereas Fisher׳s interclass separability criterion and conditional mutual information maximization work better for support vector machines. Despite the large number and variety of feature selection measures proposed in the literature, no single measure is guaranteed to outperform the others, even within a single classifier, and the overall performance of a feature selection method cannot be characterized independently of the subsequent classifier.

Keywords

Feature selection Classification