کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
424515 685582 2016 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Benchmarking performance for migrating a relational application to a parallel implementation
ترجمه فارسی عنوان
ارزیابی عملکرد برای جابجایی یک برنامه ارتباطی به اجرای موازی
کلمات کلیدی
کندو؛ هادوپ؛ معیار سنجش؛ اطلاعات بزرگ؛ SQL؛ پرس و جوها
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی

Many organizations rely on relational database platforms for OLAP-style querying (aggregation and filtering) for small to medium size applications. We investigate the impact of scaling up the data sizes for such queries. We intend to illustrate what kind of performance results an organization could expect should they migrate current applications to big data environments. This paper benchmarks the performance of Hive (Thusoo et al., 2009)  [9], a parallel data warehouse platform that is a part of the Hadoop software stack. We set up a 4-node Hadoop cluster using Hortonworks HDP 1.3.2 (Hortonworks HDP 1.3.2). We use the data generator provided by the TPC-DS benchmark (DSGen v1.1.0) to generate data of different scales. We compare the performance of loading data and querying for SQL and Hive Query Language (HiveQL) on a relational database installation (MySQL) and on a Hive cluster, respectively. We measure the speedup for query execution for three dataset sizes resulting from the scale up. Hive loads the large datasets faster than MySQL, while it is marginally slower than MySQL when loading the smaller datasets. Query execution in Hive is also faster. We also investigate executing Hive queries concurrently in workloads and conclude that serial execution of queries is a much better practice for clusters with limited resources.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 63, October 2016, Pages 148–156
نویسندگان
, , , ,