Article ID Journal Published Year Pages File Type
425246 Future Generation Computer Systems 2014 11 Pages PDF
Abstract

•A decoupled MapReduce computing-storage system for cloud computing is proposed.•Data perception mechanism after decoupling makes tasks read the closest data.•We design a VM placement strategy to exploit the data locality of tasks.•A load-aware data placement strategy is complementary to the VM placement.

MapReduce as a service enjoys wide adoption in commercial clouds today  [3] and [23]. But most cloud providers just deploy native Hadoop  [24] systems on their cloud platforms to provide MapReduce services without any adaptation to these virtualized environments  [6] and [25]. In cloud environments, the basic executing units of data processing are virtual machines. Each user’s virtual cluster needs to deploy HDFS  [26] every time when it is initialized, while the user’s input and output data should be transferred between the HDFS and external persistent data storage to ensure that the native Hadoop works properly. These costly data movements can lead to significant performance degradation of MapReduce jobs in the cloud.We present Morpho—a modified version of the Hadoop MapReduce framework, which decouples storage and computation into physical clusters and virtual clusters respectively. In Morpho, the map/reduce tasks are still running in VMs without corresponding ad-hoc HDFS deployments; instead, HDFS is deployed on the underlying physical machines. When MapReduce computation is performing, the map tasks can get data directly from physical machines without any extra data transfers. We design data location perception module to improve the cooperativity of the computation and storage layers, which means that the map tasks can intelligently fetch information about the network topology of physical machines and the VM placements. Additionally, Morpho also achieves high performance by two complementary strategies for data placement and VM placement, which can provide better map and reduce input locality. Furthermore, our data placement strategy can mitigate the resource contentions between jobs.The evaluation of our Morpho system prototype shows it achieves a nearly 62% speedup of job execution time and a significant reduction in network traffic of the entire system compared with the traditional cloud computing scheme of Amazon and other cloud providers.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , , , ,