Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4969970 | Pattern Recognition Letters | 2017 | 9 Pages |
Abstract
The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the Sâdistance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the Sâkâmeans algorithm (with Lloyd's heuristic) which replaces the conventional Euclidean distance of kâmeans with the Sâdistance. We also provide a theoretical analysis of the Sâkâmeans algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of Sâkâmeans is compared with the classical kâmeans algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Vision and Pattern Recognition
Authors
Saptarshi Chakraborty, Swagatam Das,