Towards Scalability and Data Skew Handling in GroupBy-Joins using MapReduce Model

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
485906	703344	2015	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Towards Scalability and Data Skew Handling in GroupBy-Joins using MapReduce Model

چکیده انگلیسی

For over a decade, MapReduce has become the leading programming model for parallel and massive processing of large volumes of data. This has been driven by the development of many frameworks such as Spark, Pig and Hive, facilitating data analysis on large-scale systems. However, these frameworks still remain vulnerable to communication costs, data skew and tasks imbalance problems. This can have a devastating effect on the performance and on the scalability of these systems, more particularly when treating GroupBy-Join queries of large datasets.In this paper, we present a new GroupBy-Join algorithm allowing to reduce communication costs considerably while avoiding data skew effects. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of GroupBy-Join computation even for highly skewed data. These performances have been confirmed by a series of experimentations.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 51, 2015, Pages 70-79

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Towards Scalability and Data Skew Handling in GroupBy-Joins using MapReduce Model

دسترسی سریع

ارتباط

English Website