A task routing approach to large-scale scheduling

Article ID	Journal	Published Year	Pages	File Type
425062	Future Generation Computer Systems	2013	15 Pages	PDF

Abstract

Scheduling many tasks in environments of millions of unreliable nodes is a challenging problem. To our knowledge, no work in the literature has proposed a solution that also supports many policies with very different objectives. In this paper, we present a decentralized scheduling model that overcomes these problems. A hierarchical network overlay supports a scalable resource discovery and allocation scheme. It uses aggregated information to route tasks to the most suitable execution nodes, and is easily extensible to provide very different scheduling policies. For this paper, we implemented a policy that just allocates tasks to idle nodes, a policy that minimizes the global makespan and a policy that fulfills deadline requirements. With thorough simulation tests, we conclude that our model allocates any number of tasks to several million nodes in just a few seconds, with very low overhead and high resilience. Meanwhile, policies with different objectives implemented on our model perform almost as well as their centralized counterpart.

► We present a distributed scheduling model for large-scale platforms. ► Execution node availability information is aggregated in a tree-like overlay. ► Tasks are allocated to nodes with logarithmic cost with the number of tasks. ► Many policies are supported, with performance close to a centralized model. ► The model recovers from the simultaneous failure of a certain number of nodes.

Keywords

Scheduling Distributed systems Network protocols