کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
524640 868800 2013 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Toward balanced and sustainable job scheduling for production supercomputers
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Toward balanced and sustainable job scheduling for production supercomputers
چکیده انگلیسی


• Motivated by practical issues in HEC scheduling: conflicting goals and changing workloads.
• Proposed novel scheduling strategies to achieve a balanced and sustainable job scheduling.
• Conducted comprehensive experiments on workloads from multiple supercomputer centers.
• Detailed result interpretation demonstrated the effectiveness of new methods.
• Extended by new experiments, more insightful analysis, and clearer presentation.

Job scheduling on production supercomputers is complicated by diverse demands of system administrators and amorphous characteristics of workloads. Specifically, various scheduling goals such as queuing efficiency and system utilization are usually conflicting and thus need to be balanced. Also, changing workload characteristics often impact the effectiveness of the deployed scheduling policies. Thus it is challenging to design a versatile scheduling policy that is effective in all circumstances. In this paper, we propose a novel job scheduling strategy to balance diverse scheduling goals and mitigate the impact of workload characteristics. First, we introduce metric-aware scheduling, which enables the scheduler to balance competing scheduling goals represented by different metrics such as job waiting time, fairness, and system utilization. Second, we design a scheme to dynamically adjust scheduling policies based on feedback information of monitored metrics at runtime. We evaluate our design using real workloads from supercomputer centers. The results demonstrate that our scheduling mechanism can significantly improve system performance in a balanced, sustainable fashion.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 39, Issue 12, December 2013, Pages 753–768
نویسندگان
, , , ,