Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4961332 | Procedia Computer Science | 2016 | 9 Pages |
The usage of Hadoop cluster is widely spread in different business and academic spheres. The performance of Hadoop depends on various factors, such as amount and frequency of CPU cores, RAM capacity, throughput of storages, dataflow's intensity, network bandwidth and latency, etc. The heterogeneity of a computing environment raises such problems as the optimization of data distribution across computing and storage resources of Hadoop cluster. In this paper, we propose an approach for the improvement of data placement and suggest an implementation of presented algorithm in Hadoop platform. Proposed method uses HDFS distributed cache to enhance a performance of task's execution. As a result, the introduced algorithm leads to the reduction of overall MapReduce tasks' execution time and increasing of I/O rates during the map stage.