Article ID Journal Published Year Pages File Type
424567 Future Generation Computer Systems 2015 12 Pages PDF
Abstract

•Improving performance of MapReduce programs in heterogeneous environments and hybrid clouds.•Enhancing data locality through a virtual machine mapping technique.•Optimizing shuffle performance and reducing communication overheads in distributed systems.•We propose a loading aware technique to balance workload of reducers at run-time.

Big data refers to data that is so large that it exceeds the processing capabilities of traditional systems. Big data can be awkward to work and the storage, processing and analysis of big data can be problematic. MapReduce is a recent programming model that can handle big data. MapReduce achieves this by distributing the storage and processing of data amongst a large number of computers (nodes). However, this means the time required to process a MapReduce job is dependent on whichever node is last to complete a task. Heterogeneous environments exacerbate this problem.In this paper we propose a method to improve MapReduce execution in heterogeneous environments. This is done by dynamically partitioning data before the Map phase and by using virtual machine mapping in the Reduce phase in order to maximize resource utilization. Simulation and experimental results show an improvement in MapReduce performance, including data locality and total completion time with different optimization approaches.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,