کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
396585 | 670398 | 2011 | 22 صفحه PDF | دانلود رایگان |

We study the problem of clustering data objects with location uncertainty. In our model, a data object is represented by an uncertainty region over which a probability density function (pdf) is defined. One method to cluster such uncertain objects is to apply the UK-means algorithm [1], an extension of the traditional K-means algorithm, which assigns each object to the cluster whose representative has the smallest expected distance from it. For arbitrary pdf, calculating the expected distance between an object and a cluster representative requires expensive integration of the pdf. We study two pruning methods: pre-computation (PC) and cluster shift (CS) that can significantly reduce the number of integrations computed. Both pruning methods rely on good bounding techniques. We propose and evaluate two such techniques that are based on metric properties (Met) and trigonometry (Tri). Our experimental results show that Tri offers a very high pruning power. In some cases, more than 99.9% of the expected distance calculations are pruned. This results in a very efficient clustering algorithm. 1
Research Highlights
► We propose pruning methods to speed up UK-means for clustering data with uncertainty.
► The pre-computation (PC) method promotes reuse of pre-computed values.
► The cluster shift (CS) method utilizes calculations made during clustering.
► The methods are helped by bounds derived from metric and trigonometric properties.
► In some cases the methods can prune more than 99.9% expected distance calculations.
Journal: Information Systems - Volume 36, Issue 2, April 2011, Pages 476–497