کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4961103 1446508 2017 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Parallel Implementation of Density Peaks Clustering Algorithm Based on Spark
ترجمه فارسی عنوان
اجرای موازی الگوریتم خوشه بندی چگالی با استفاده از جرقه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

Clustering algorithm is widely used in data mining. It attempt to classify elements into several clusters, and the elements in the same cluster are more similar to each other meanwhile the elements belonging to other clusters are not similar. The recently published density peaks clustering algorithm can overcome the disadvantage of the distance-based algorithm that can only find clusters of nearly-circular shapes, instead it can discover clusters of arbitrary shapes and it is insensitive to noise data. However it needs calculate distances between all pairs of data points and is not scalable to the big data, in order to reduce the computational cost of the algorithm we propose an efficient distributed density peaks clustering algorithm based on Spark's GraphX. This paper proves the effectiveness of the method based on two different data set. The experimental results show our system can improve the performance significantly (up to 10x) comparing to MapReduce implementation. We also evaluate our system expansibility and scalability.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 107, 2017, Pages 442-447
نویسندگان
, , , , ,