Article ID Journal Published Year Pages File Type
563050 Signal Processing 2013 13 Pages PDF
Abstract

•Novel resampling based method.•Choice of the best classifier among set of candidates.•Estimation of the optimal feature set dimensionality.•Algorithm tested on synthetic and real data.

We address two fundamental design issues of a classification system: the choice of the classifier and the dimensionality of the optimal feature subset. Resampling techniques are applied to estimate both the probability distribution of the misclassification rate (or any other figure of merit of a classifier) subject to the size of the feature set, and the probability distribution of the optimal dimensionality given a classification system and a misclassification rate. The latter allows for the estimation of confidence intervals for the optimal feature set size. Based on the former, a quality assessment for the classifier performance is proposed. Traditionally, the comparison of classification systems is accomplished for a fixed feature set. However, a different set may provide different results. The proposed method compares the classifiers independently of any pre-selected feature set. The algorithms are tested on 80 sets of synthetic examples and six standard databases of real data. The simulated data results are verified by an exhaustive search of the optimum and by two feature selection algorithms for the real data sets.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,