کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4942425 1437283 2017 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments
چکیده انگلیسی
Among the so-called “4Vs” (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of “Volume” in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the MapReduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the “process” level for coarse-grained distribution and the “functionality” and “elementary functions” levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 111, September 2017, Pages 114-136
نویسندگان
, , ,