کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4950358 1440640 2017 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters
چکیده انگلیسی


- Multiworkflow scheduling strategy on a cluster is proposed.
- Critical path with data-aware.
- Scheduling is proposed to improve makespan of bioinformatic workflows.
- Simulator engine extension to scale on to a bigger cluster infrastructure and new storage hierarchy.

A representative set of workflows found in bioinformatics pipelines must deal with large data sets. Most scientific workflows are defined as Direct Acyclic Graphs (DAGs). Despite DAGs are useful to understand dependence relationships, they do not provide any information about input, output and temporal data files. This information about the location of files of data intensive applications helps to avoid performance issues.This paper presents a multiworkflow store-aware scheduler in a cluster environment called Critical Path File Location (CPFL) policy where the access time to disk is more relevant than network, as an extension of the classical list scheduling policies. Our purpose is to find the best location of data files in a hierarchical storage system.The resulting algorithm is tested in an HPC cluster and in a simulated cluster scenario with bioinformatics synthetic workflows, and largely used benchmarks like Montage and Epigenomics. The resulting simulator is tuned and validated with the first test results from the real infrastructure. The evaluation of our proposal shows promising results up to 70% on benchmarks in real HPC clusters using 128 cores and up to 69% of makespan improvement on simulated 512 cores clusters with a deviation between 0.9% and 3% regarding the real HPC cluster.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 74, September 2017, Pages 51-62
نویسندگان
, , , ,