Article ID Journal Published Year Pages File Type
4969970 Pattern Recognition Letters 2017 9 Pages PDF
Abstract
The choice of a proper similarity/dissimilarity measure is very important in cluster analysis for revealing the natural grouping in a given dataset. Choosing the most appropriate measure has been an open problem for many years in cluster analysis. Among various approaches of incorporating a non-Euclidean dissimilarity measure for clustering, use of the divergence-based distance functions has recently gained attention in the perspective of partitional clustering. Following this direction, we propose a new point-to-point distance measure called the S−distance motivated from the recently developed S-divergence measure (originally defined on the open cone of positive definite matrices) and discuss some of its important properties. We subsequently develop the S−k−means algorithm (with Lloyd's heuristic) which replaces the conventional Euclidean distance of k−means with the S−distance. We also provide a theoretical analysis of the S−k−means algorithm establishing the convergence of the obtained partial optimal solutions to a locally optimal solution. The performance of S−k−means is compared with the classical k−means algorithm with Euclidean distance metric and its feature-weighted variants using several synthetic and real-life datasets. The comparative study indicates that our results are appealing, especially when the distribution of the clusters is not regular.
Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,