An Optimal Task Selection Scheme for Hadoop Scheduling

Article ID	Journal	Published Year	Pages	File Type
554315	IERI Procedia	2014	6 Pages	PDF

Abstract

MapReduce is a popular parallel programming model used to solve wide range of BigData applications in cloud computing environment. Hadoop is an open source implementation MapReduce and widely used by vast amount of users. It provides an abstracted environment for running large scale data intensive applications in a scalable and fault tolerant manner. There are several Hadoop scheduling algorithms are proposed in the literature with various performance goals. In this paper, a new optimal task selection scheme is introduced in to assist the scheduler when multiple local tasks are available for a node. To improve the probability of percentage of local tasks launched for a job in future, the task which has least number of replicas of input, individual load of disks attached to the node and maximum expected time to wait for next local node is launched among the available local tasks for a node. The proposed method was evaluated by extensive experiments and it has been observed that the method improves the performance significantly. From the experiments, around 20% of improvements achieved in terms of locality and fairness.