کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4951627 1441484 2017 41 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Towards the efficient parallelization of multi-pass adaptive blocking for entity matching
ترجمه فارسی عنوان
به سوی موازی موثر بلوک انطباق چند گذر برای سازگاری نهاد
کلمات کلیدی
تطابق سازنده، نمایه سازی مسدود کردن، بلوط سازگار،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
Modern parallel computing programming models, such as MapReduce (MR), have proven to be powerful tools for efficient parallel execution of data-intensive tasks such as Entity Matching (EM) in the era of Big Data. For this reason, studies about challenges and possible solutions of how EM can benefit from this well-known cloud computing programming model have become an important demand nowadays. Furthermore, the effectiveness and scalability of MR-based implementations for EM depend on how well the workload distribution is balanced among all reduce tasks. In this article, we investigate how MapReduce can be used to perform efficient (load balanced) parallel EM using a variation of the multi-pass Sorted Neighborhood Method (SNM) that uses a varying size (adaptive) window. We propose Multi-pass MapReduce Duplicate Count Strategy (MultiMR-DCS++), a MR-based approach for multi-pass adaptive SNM, aiming to increase even more the performance of the SNM. The evaluation results based on real-world datasets and cluster infrastructure show that our approach increases the performance of MapReduce-based SNM regarding the EM execution time and detection quality.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 101, March 2017, Pages 27-40
نویسندگان
, , ,