Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications

Article ID	Journal	Published Year	Pages	File Type
424567	Future Generation Computer Systems	2015	12 Pages	PDF

Abstract

•Improving performance of MapReduce programs in heterogeneous environments and hybrid clouds.•Enhancing data locality through a virtual machine mapping technique.•Optimizing shuffle performance and reducing communication overheads in distributed systems.•We propose a loading aware technique to balance workload of reducers at run-time.

Big data refers to data that is so large that it exceeds the processing capabilities of traditional systems. Big data can be awkward to work and the storage, processing and analysis of big data can be problematic. MapReduce is a recent programming model that can handle big data. MapReduce achieves this by distributing the storage and processing of data amongst a large number of computers (nodes). However, this means the time required to process a MapReduce job is dependent on whichever node is last to complete a task. Heterogeneous environments exacerbate this problem.In this paper we propose a method to improve MapReduce execution in heterogeneous environments. This is done by dynamically partitioning data before the Map phase and by using virtual machine mapping in the Reduce phase in order to maximize resource utilization. Simulation and experimental results show an improvement in MapReduce performance, including data locality and total completion time with different optimization approaches.

Keywords

Bigdata Cloud computing Distributed computing virtual machines Heterogeneity MapReduce