کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
484074 703253 2016 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Faster cloud Star Joins with Reduced Disk Spill and Network Communication
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Faster cloud Star Joins with Reduced Disk Spill and Network Communication
چکیده انگلیسی

Combining powerful parallel frameworks and on-demand commodity hardware, cloud computing has made both analytics and decision support systems canonical to enterprises of all sizes. Associated with unprecedented volumes of data stacked by such companies, filtering and retrieving them are pressing challenges. This data is often organized in star schemas, in which Star Joins are ubiquitous and expensive operations. In particular, excessive disk spill and network communication are tight bottlenecks for all current MapReduce or Spark solutions. Here, we propose two efficient solutions that drop the computation time by at least 60%: the Spark Bloom-Filtered Cascade Join (SBFCJ) and the Spark Broadcast Join (SBJ). Conversely, a direct Spark implementation of a sequence of joins renders poor performance, showcasing the importance of further filtering for minimal disk spill and network communication. Finally, while SBJ is twice faster when memory per executor is large enough, SBFCJ is remarkably resilient to low memory scenarios. Both algorithms pose very competitive solutions to Star Joins in the cloud.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 80, 2016, Pages 74–85
نویسندگان
, , , ,