کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4970155 1450030 2017 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A highly scalable clustering scheme using boundary information
ترجمه فارسی عنوان
یک طرح خوشه بندی بسیار مقیاس پذیر با استفاده از اطلاعات مرزی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
Many advanced clustering techniques are effective in dealing datasets in complicated situations. However, when facing large datasets, which are increasingly common in the era of big data, the time requirements of most existing techniques can quickly become intolerable. To tackle this challenge, in this paper, we propose Scalable Clustering Using Boundary Information (SCUBI), a highly flexible and scalable clustering scheme. The idea of SCUBI is to identify the boundary points of the original dataset in the first place and then group boundary points into suitable clusters using existing clustering techniques. Finally, the rest points are assigned to the same cluster as their nearest boundary points. To demonstrate the effectiveness and scalability of SCUBI, we plug the well-known DBSCAN algorithm into SCUBI. Comprehensive experiments are conducted using datasets with up to two million data points to compare the clustering results and time efficiency between DBSCAN and SCUBI-DBSCAN. Experimental results show that our method can obtain almost identical clustering results as the standard DBSCAN while achieving orders of magnitude speedup especially on large datasets, which confirms the scalability of SCUBI. Experiments are also performed on other clustering algorithms with high time complexity to verify the flexibility of SCUBI.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 89, 1 April 2017, Pages 1-7
نویسندگان
, , ,