Article ID Journal Published Year Pages File Type
554315 IERI Procedia 2014 6 Pages PDF
Abstract

MapReduce is a popular parallel programming model used to solve wide range of BigData applications in cloud computing environment. Hadoop is an open source implementation MapReduce and widely used by vast amount of users. It provides an abstracted environment for running large scale data intensive applications in a scalable and fault tolerant manner. There are several Hadoop scheduling algorithms are proposed in the literature with various performance goals. In this paper, a new optimal task selection scheme is introduced in to assist the scheduler when multiple local tasks are available for a node. To improve the probability of percentage of local tasks launched for a job in future, the task which has least number of replicas of input, individual load of disks attached to the node and maximum expected time to wait for next local node is launched among the available local tasks for a node. The proposed method was evaluated by extensive experiments and it has been observed that the method improves the performance significantly. From the experiments, around 20% of improvements achieved in terms of locality and fairness.

Related Topics
Physical Sciences and Engineering Computer Science Information Systems