Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering

Article ID	Journal	Published Year	Pages	File Type
533357	Pattern Recognition	2012	11 Pages	PDF

Abstract

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.

► We derive a statistical model for dissimilarity increments in d-dimensional spaces. ► We propose GMDID: a Clustering algorithm using this model. ► We compare with other clustering techniques on multiple synthetic and real datasets. ► The results show GMDID's superior performance.

Keywords

Likelihood-ratio test Minimum Description Length Partitional clustering