Large-scale incremental processing with MapReduce

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
425245	685710	2014	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Data deduplication - داده کاوی MapReduce - نگاشت کاهش Hadoop - هادوپ Incremental processing - پردازش افزایشی Big data processing - پردازش داده های بزرگ

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Large-scale incremental processing with MapReduce

چکیده انگلیسی

• Revealing the ineffectiveness of task-level memoization for incremental processing.
• An algorithm to detect changes in large datasets efficiently in Hadoop clusters.
• An efficient implementation to compute the updated result in Hadoop clusters.

An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing with conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Instead, HadUP detects and computes the change of datasets at a fine-grained level using a deduplication-based snapshot differential algorithm (D-SD) and update propagation. As a result, it provides high performance, especially in an environment where task-level memoization has no benefit. HadUP requires only a small amount of extra programming cost because it can reuse the code for the map and reduce functions of Hadoop. Therefore, the development of HadUP applications is quite easy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 36, July 2014, Pages 66–79

نویسندگان

Daewoo Lee, Jin-Soo Kim, Seungryoul Maeng,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Large-scale incremental processing with MapReduce

دسترسی سریع

ارتباط

English Website