Classification performance resulting from a 2-means

Article ID	Journal	Published Year	Pages	File Type
1148881	Journal of Statistical Planning and Inference	2013	11 Pages	PDF

Abstract

The k-means procedure is probably one of the most common nonhierachical clustering techniques. From a theoretical point of view, it is related to the search for the k principal points of the underlying distribution. In this paper, the classification resulting from that procedure for k=2 is shown to be optimal under a balanced mixture of two spherically symmetric and homoscedastic distributions. Then, the classification efficiency of the 2-means rule is assessed using the second order influence function and compared to the classification efficiencies of Fisher and Logistic discriminations. Influence functions are also considered here to compare the robustness to infinitesimal contamination of the 2-means method w.r.t. the generalized 2-means technique.

Keywords

Cluster analysis Influence function k-Means error rate Principal points Robustness