کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
464719 697396 2016 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Data locality in MapReduce: A network perspective
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Data locality in MapReduce: A network perspective
چکیده انگلیسی

Data locality, a critical consideration for the performance of task scheduling in MapReduce, has been addressed in the literature by increasing the number of locally processed tasks. In this paper, we view the data locality problem from a network perspective. The key observation is that if we make appropriate use of the network to route the data chunk to the machine where it will be processed in advance, then processing a remote task is the same as processing a local task. However, to benefit from such a strategy, we must (i) balance the tasks assigned to local machines and those assigned to remote machines, and (ii) design the routing algorithm to avoid network congestion. Taking these challenges into consideration, we propose a scheduling/routing algorithm, named the Joint Scheduler, which utilizes both the computing resources and the communication network efficiently. We prove that the Joint Scheduler is throughput optimal; i.e., it supports any load that is supportable by any other algorithm. Simulation results demonstrate that with popularity skew, the Joint Scheduler improves the throughput and delay performance significantly compared to the Hadoop Fair Scheduler with delay scheduling, which is the de facto industry standard.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Performance Evaluation - Volume 96, February 2016, Pages 1–11
نویسندگان
, ,