Article ID Journal Published Year Pages File Type
453940 Computers & Electrical Engineering 2016 17 Pages PDF
Abstract

•Analysis and evaluation of several HPC-oriented MapReduce frameworks on a cluster.•Proposal of a taxonomy to classify these frameworks according to their characteristics.•Experimental configuration using several workloads, cluster sizes, networks and disk technologies.•Evaluation in terms of performance and energy efficiency.•Results useful to select a suitable MapReduce framework and to identify desirable characteristics for future ones.

The ever growing needs of Big Data applications are demanding challenging capabilities which cannot be handled easily by traditional systems, and thus more and more organizations are adopting High Performance Computing (HPC) to improve scalability and efficiency. Moreover, Big Data frameworks like Hadoop need to be adapted to leverage the available resources in HPC environments. This situation has caused the emergence of several HPC-oriented MapReduce frameworks, which benefit from different technologies traditionally oriented to supercomputing, such as high-performance interconnects or the message-passing interface. This work aims to establish a taxonomy of these frameworks together with a thorough evaluation, which has been carried out in terms of performance and energy efficiency metrics. Furthermore, the adaptability to emerging disks technologies, such as solid state drives, has been assessed. The results have shown that new frameworks like DataMPI can outperform Hadoop, although using IP over InfiniBand also provides significant benefits without code modifications.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , ,