کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4945058 | 1438292 | 2017 | 11 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
هوش مصنوعی
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
This paper describes an extended version of HBelt system [1] which tightly integrates the wide-column NoSQL database HBase with a clustered & pipelined ETL engine. Our objective is to efficiently refresh HBase tables with remote source updates while a consistent snapshot is guaranteed across distributed partitions for each scan request in analytical queries. A consistency model is defined and implemented to address so-called distributed snapshot maintenance. To achieve this, ETL jobs and analytical queries are scheduled in a distributed processing environment. In addition, a partitioned, incremental ETL pipeline is introduced to increase the performance of ETL (update) jobs. We validate the efficiency gain in terms of data pipelining and data partitioning using the TPC-DS benchmark, which simulates a modern decision support system for a retail product supplier. Experimental results show that high query throughput can be achieved in HBelt when distributed, refreshed snapshots are demanded.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 70, October 2017, Pages 48-58
Journal: Information Systems - Volume 70, October 2017, Pages 48-58
نویسندگان
Weiping Qu, Stefan Dessloch,