Article ID Journal Published Year Pages File Type
533357 Pattern Recognition 2012 11 Pages PDF
Abstract

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.

► We derive a statistical model for dissimilarity increments in d-dimensional spaces. ► We propose GMDID: a Clustering algorithm using this model. ► We compare with other clustering techniques on multiple synthetic and real datasets. ► The results show GMDID's superior performance.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,