کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
378717 659210 2016 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
CAT: A Cost-Aware Translator for SQL-query workflow to MapReduce jobflow
ترجمه فارسی عنوان
CAT: یک مترجم آگاه از هزینه برای گردش کار پرس و جوی SQL برای گردش کار نگاشت‌کاهش
کلمات کلیدی
نگاشت‌کاهش؛ SQL به نگاشتکاهش؛ همبستگی داخل SQL؛ هزینه مدل برآورد؛ هادوپ؛ پرس و جو
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

MapReduce is undoubtedly the most popular framework for large-scale processing and analysis of vast data sets in clusters of machines. To facilitate the easier use of MapReduce, SQL-like declarative languages and SQL-to-MapReduce translators have attracted increasing attentions recently. The SQL-to-MapReduce translator can automatically generate the MapReduce jobflow for each SQL query submitted by users, which significantly simplifies the interfacing between users and systems. Although a plethora of translators have been developed, the auto-generated MapReduce programs still suffered from extremely inefficiency. In this paper, we attempt to address this challenge by developing a novel Cost-Aware Translator (CAT). CAT has two notable features. First, it defines two intra-SQL correlations: Generalized Job Flow Correlation (GJFC) and Input Correlation (IC), based on which a set of looser merging rules are introduced. Thus, both Top-Down (TD) and Bottom-Up (BU) merging strategies are proposed and integrated into CAT simultaneously. Second, it adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapReduce jobflows auto-generated by TD and BU merging strategies. Finally, comparative experiments on TPC-H benchmark demonstrate the effectiveness and scalability of CAT.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 102, March 2016, Pages 42–56
نویسندگان
, , , ,