کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
494926 862809 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Scheduling algorithm based on prefetching in MapReduce clusters
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Scheduling algorithm based on prefetching in MapReduce clusters
چکیده انگلیسی


• We explain in detail the architecture of prefetching module in Section 4.4.
• We detail the framework of HPSO by example in Section 4.1.
• We modify the scheduling algorithm based on prefetching to fully exploit the potential map tasks with data locality in Section 4.3.1. This method has the advantages of reducing network transmission. Furthermore, we consider part of nodes, whose remaining time is less then threshold Tunder to avoid invalid data prefetching.
• We conduct a serial of experiments to evaluate performance of the proposed system using different 5 applications (Section 5).
• A survey on the state-of-the-art method for improving data locality is conducted in Section 6.

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes for future map tasks based on current pending tasks and then preload the needed data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 38, January 2016, Pages 1109–1118
نویسندگان
, , , , ,