k-means clustering with outlier removal

Article ID	Journal	Published Year	Pages	File Type
4969980	Pattern Recognition Letters	2017	10 Pages	PDF

Abstract

Outlier detection is an important data analysis task in its own right and removing the outliers from clusters can improve the clustering accuracy. In this paper, we extend the k-means algorithm to provide data clustering and outlier detection simultaneously by introducing an additional “cluster” to the k-means algorithm to hold all outliers. We design an iterative procedure to optimize the objective function of the proposed algorithm and establish the convergence of the iterative procedure. Numerical experiments on both synthetic data and real data are provided to demonstrate the effectiveness and efficiency of the proposed algorithm.

Keywords

68T10 62P10 62H30 91C20 Outlier detection Data clustering k-Means