Determination of the optimal number of features for quadratic discriminant analysis via the normal approximation to the discriminant distribution

Article ID	Journal	Published Year	Pages	File Type
10360464	Pattern Recognition	2005	19 Pages	PDF

Abstract

Our goal is to find an essentially analytic method to produce an error curve as a function of the number of features so that the curve can be minimized to determine an optimal number of features. We use a normal approximation to the distribution of the estimated discriminant. Since the mean and variance of the estimated discriminant will be exact, these provide insight into how the covariance matrices affect the optimal number of features. We derive the mean and variance of the estimated discriminant and compare feature-size optimization using the normal approximation to the estimated discriminant with optimization obtained by simulating the true distribution of the estimated discriminant. Optimization via the normal approximation to the estimated discriminant provides huge computational savings in comparison to optimization via simulation of the true distribution. Feature-size optimization via the normal approximation is very accurate when the covariance matrices differ modestly. The optimal number of features based on the normal approximation will exceed the actual optimal number when there is large disagreement between the covariance matrices; however, this difference is not important because the true misclassification error using the number of features obtained from the normal approximation and the number obtained from the true distribution differ only slightly, even for significantly different covariance matrices.

Keywords

Quadratic discriminant analysis