Correcting the Kullback-Leibler distance for feature selection

Article ID	Journal	Published Year	Pages	File Type
10361522	Pattern Recognition Letters	2005	9 Pages	PDF

Abstract

A frequent practice in feature selection is to maximize the Kullback-Leibler (K-L) distance between target classes. In this note we show that this common custom is frequently suboptimal, since it fails to take into account the fact that classification occurs using a finite number of samples. In classification, the variance and higher order moments of the likelihood function should be taken into account to select feature subsets, and the Kullback-Leibler distance only relates to the mean separation. We derive appropriate expressions and show that these can lead to major increases in performance.

Keywords

Feature selection Classification error Kullback–Leibler distance