کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
530861 869796 2011 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A distance based clustering method for arbitrary shaped clusters in large datasets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
A distance based clustering method for arbitrary shaped clusters in large datasets
چکیده انگلیسی

Clustering has been widely used in different fields of science, technology, social science, etc  . Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of O(n2)O(n2), where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets.


► Two distance based clustering methods are proposed for arbitrary shaped clusters.
► Clustering results of one method are exactly same as the single-link method.
► Clustering results are analyzed experimentally and theoretically.
► Methods are significantly faster than classical single-link method.
► Methods are highly suitable for large datasets as they scan dataset atmost twice.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 44, Issue 12, December 2011, Pages 2862–2870
نویسندگان
, , ,