کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6874884 1441462 2018 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An efficient theta-join query processing in distributed environment
ترجمه فارسی عنوان
یک پردازش پرس و جو پرسرعت در محیط توزیع شده
کلمات کلیدی
چارچوب موازی توزیع شده، الگوریتم پیوستن تتا، بهینه سازی پرس و جو، پردازش داده ها در مقیاس بزرگ،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
Theta-join query is very useful in many data analysis tasks, but it is not efficiently processed in distributed environment, especially in large scale data. Although there is much progress in dealing theta-join with MapReduce paradigm, the methods are either complex which require fundamental changes to MapReduce framework or only consider the overheads of load balance in the network, when data scale is large, they will make much computation cost and induce OOM (Out of Memory) errors. In this work, we propose a filter method for theta-join on the purpose of reducing the computation cost and achieving the minimum execution time in distributed environment. We consider not only the load balance in the cluster, but also the memory cost in parallel framework. We also propose a keys-based join solution for multi-way theta-join to reduce the data amount for cross product, then improve the performance of join efficiency. We implement our methods in a popular general-purpose data processing framework, Spark. The experimental results demonstrate that our methods can significantly improve the performance of theta-joins comparing with the state-of-art solutions.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 121, November 2018, Pages 42-52
نویسندگان
, ,