کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
484820 703295 2015 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf
چکیده انگلیسی

One of the biggest challenges of the current big data landscape is our inability to pro- cess vast amounts of information in a reasonable time. In this work, we explore and com- pare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multi- core infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data manage- ment infrastructure and the possibility of dealing with other aspects such as node failure and data replication.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 53, 2015, Pages 121-130