pp-PIC: Parallel power iteration clustering for big data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
431868	688642	2013	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Clustering - خوشه بندی Spectral clustering - خوشه بندی طیفی Data-mining - داده کاوی Cloud computing - رایانش ابری Distributed computing - رایانش توزیع شده Parallel computing - رایانش موازی، محاسبات موازی Big Data - کلان داده Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

pp-PIC: Parallel power iteration clustering for big data

چکیده انگلیسی

Power iteration clustering (PIC) is a newly developed clustering algorithm. It performs clustering by embedding data points in a low-dimensional subspace derived from the similarity matrix. Compared to traditional clustering algorithms, PIC is simple, fast and relatively scalable. However, it requires the data and its associated similarity matrix fit into memory, which makes the algorithm infeasible for big data applications. This paper attempts to expand PIC’s data scalability by implementing a parallel power iteration clustering (pp-PIC). While this paper focuses on exploring different parallelization strategies and implementation details for minimizing computation and communication costs, we have also paid great attention to ensuring the algorithm works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers). The experimental results demonstrate that the proposed pp-PIC algorithm is highly scalable to both data and compute resources.

► Proposed parallel power iteration clustering algorithm.
► Implemented the pp-PIC algorithm using MPI.
► Ran the pp-PIC algorithm on different data sizes.
► Ran the algorithm on both cluster and cloud computers.
► Obtained almost linear speedups.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 3, March 2013, Pages 352–359

نویسندگان

Weizhong Yan, Umang Brahmakshatriya, Ya Xue, Mark Gilder, Bowden Wise,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

pp-PIC: Parallel power iteration clustering for big data

دسترسی سریع

ارتباط

English Website