Performance Improvement of MapReduce Framework in Heterogeneous Context using Reinforcement Learning

Article ID	Journal	Published Year	Pages	File Type
489850	Procedia Computer Science	2015	7 Pages	PDF

Abstract

MapReduce is presently established as an important distributed and parallel programming model with wide acclaim for large scale computing. Intelligent scheduling decisions can help in reducing the overall runtime of the jobs. MapReduce performance is currently limited by its default scheduler, which does not adapt well in heterogeneous environments. Heterogeneous environments were considered in Longest Approximate Time to End scheduler. This too has several shortcomings due to the static manner in which it computes progress of tasks. The lack of adequate approach to heterogeneous environments is currently being taken up in recent research. In this paper, we propose a novel MapReduce scheduler in heterogeneous environments based on Reinforcement learning called MapReduce Reinforcement Learning scheduler, which observes the system state of task execution and suggests speculative re-execution of the slower tasks to other available nodes in the cluster for faster execution. The proposed approach adapts to the heterogeneous environment and no prior knowledge of the environmental characteristics are required. It is expected that over a few runs the system would be able to better map the computing requirements to the resources available in a heterogeneous cluster and minimizes the overall job completion time.