kâMeans clustering with a new divergence-based distance metric: Convergence and performance analysis

Article ID	Journal	Published Year	Pages	File Type
4969970	Pattern Recognition Letters	2017	9 Pages	PDF

Abstract

The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the Sâdistance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the Sâkâmeans algorithm (with Lloyd's heuristic) which replaces the conventional Euclidean distance of kâmeans with the Sâdistance. We also provide a theoretical analysis of the Sâkâmeans algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of Sâkâmeans is compared with the classical kâmeans algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular.

Keywords

Convergence