DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
396704	670557	2014	21 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Clustering algorithm - الگوریتم خوشه بندی Parallel algorithm - الگوریتم‌های موازی Density-based clustering - خوشه بندی مبتنی بر تراکم MapReduce - نگاشت کاهش

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

چکیده انگلیسی

• A density-based clustering algorithm DBCURE can find clusters with varying densities.
• DBCURE is a generalization of DBSCAN using ellipsoidal neighborhoods.
• We propose a parallel version of DBCURE, called DBCURE-MR, using MapReduce.
• DBCURE-MR finds clusters correctly based on the definition of density-based clusters.
• Experimental results show the efficiency and scalability of the proposed algorithms.

Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Density-based clustering algorithms such as DBSCAN and OPTICS are one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, parallelizing clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention.In this paper, we first propose the new density-based clustering algorithm, called DBCURE, which is robust to find clusters with varying densities and suitable for parallelizing the algorithm with MapReduce. We next develop DBCURE-MR, which is a parallelized DBCURE using MapReduce. While traditional density-based algorithms find each cluster one by one, our DBCURE-MR finds several clusters together in parallel. We prove that both DBCURE and DBCURE-MR find the clusters correctly based on the definition of density-based clusters. Our experimental results with various data sets confirm that DBCURE-MR finds clusters efficiently without being sensitive to the clusters with varying densities and scales up well with the MapReduce framework.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 42, June 2014, Pages 15–35

نویسندگان

Younghoon Kim, Kyuseok Shim, Min-Soeng Kim, June Sup Lee,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

دسترسی سریع

ارتباط

English Website