کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
484820 | 703295 | 2015 | 10 صفحه PDF | دانلود رایگان |

One of the biggest challenges of the current big data landscape is our inability to pro- cess vast amounts of information in a reasonable time. In this work, we explore and com- pare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multi- core infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data manage- ment infrastructure and the possibility of dealing with other aspects such as node failure and data replication.
Journal: Procedia Computer Science - Volume 53, 2015, Pages 121-130