Article ID Journal Published Year Pages File Type
4961332 Procedia Computer Science 2016 9 Pages PDF
Abstract

The usage of Hadoop cluster is widely spread in different business and academic spheres. The performance of Hadoop depends on various factors, such as amount and frequency of CPU cores, RAM capacity, throughput of storages, dataflow's intensity, network bandwidth and latency, etc. The heterogeneity of a computing environment raises such problems as the optimization of data distribution across computing and storage resources of Hadoop cluster. In this paper, we propose an approach for the improvement of data placement and suggest an implementation of presented algorithm in Hadoop platform. Proposed method uses HDFS distributed cache to enhance a performance of task's execution. As a result, the introduced algorithm leads to the reduction of overall MapReduce tasks' execution time and increasing of I/O rates during the map stage.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,