کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432311 | 688855 | 2015 | 12 صفحه PDF | دانلود رایگان |
• Migration of selective locality-flexible tasks strikes a balance between load balancing and data locality.
• Locality-aware distributed scheduling yields up to 32% speedup over competing techniques.
• Applications with large task-granularities benefit more from this technique.
• Applications not suited to locality-aware scheduling or with fine-grain tasks do not lose performance with this technique.
What are the performance benefits of selectively relaxing the locality preferences of some tasks in parallel applications? Can load-balancing algorithms for a distributed-memory cluster benefit from this relaxation? This work investigates these ideas by employing application-level task locality for selection of tasks rather than hardware memory topology as is the norm in the literature. A prototype designed to evaluate these ideas is implemented in X10, a realization of the asynchronous partitioned global address space programming model. This evaluation reveals the applicability of this new approach to several real-world applications chosen from the Cowichan and the Lonestar suites. On a cluster of 128 processors, the new work-stealing strategy demonstrates a speedup between 12% and 32% over X10’s existing scheduler. Moreover, the new strategy does not degrade the performance of any of the applications studied.
Journal: Journal of Parallel and Distributed Computing - Volume 76, February 2015, Pages 94–105