Article ID Journal Published Year Pages File Type
453938 Computers & Electrical Engineering 2016 14 Pages PDF
Abstract

MapReduce is a promising distributed computing platform for large-scale data processing applications. Hadoop MapReduce has been considered as one of the most extensively used open-source implementations of MapReduce frameworks for its flexible customization and convenient usage. Despite these advantages, a relatively slow running task called straggler task impedes job progress. In this study, two novel speculative strategies, namely, Estimate Remaining time Using Linear relationship model (ERUL) and extensional Maximum Cost Performance (exMCP), are developed to improve the estimation of the remaining time of a task. ERUL is a dynamic system load-aware strategy; using this strategy, we can overcome some drawbacks of the Longest Approximate Time to End (LATE) that misleads speculative execution in some cases. In exMCP, different slot values are considered. Extensive experiments show that ERUL and exMCP are applied to accurately estimate the remaining execution times of running tasks and reduce the running time of a job.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights•A novel speculative strategy of remaining time estimation is presented.•An extensional maximum cost performance is developed.•The system load is considered while estimating the remaining time.•The proposed Hadoop-ERUL works more precisely and rapidly.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , ,