کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425245 685710 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Large-scale incremental processing with MapReduce
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Large-scale incremental processing with MapReduce
چکیده انگلیسی


• Revealing the ineffectiveness of task-level memoization for incremental processing.
• An algorithm to detect changes in large datasets efficiently in Hadoop clusters.
• An efficient implementation to compute the updated result in Hadoop clusters.

An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing with conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Instead, HadUP detects and computes the change of datasets at a fine-grained level using a deduplication-based snapshot differential algorithm (D-SD) and update propagation. As a result, it provides high performance, especially in an environment where task-level memoization has no benefit. HadUP requires only a small amount of extra programming cost because it can reuse the code for the map and reduce functions of Hadoop. Therefore, the development of HadUP applications is quite easy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 36, July 2014, Pages 66–79
نویسندگان
, , ,