A case study of large-scale parallel I/O analysis and optimization for numerical weather prediction system

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
425895	685948	2014	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Grapes - انگور Parallel I/O - موازی I / O

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

A case study of large-scale parallel I/O analysis and optimization for numerical weather prediction system

چکیده انگلیسی

• We implement two parallel I/O strategies based on MPI-IO and ADIOS in GRAPES.
• “MPI_AMR” in ADIOS is ideal if aggregator number is twice or four times that of OSTs.
• 114 MB per OST is the optimal size for each I/O on the Tianhe-1A system.
• 8 time-step aggregation by “MPI_AMR” reaches 7.69 GB/s, 68.4% of theoretical bandwidth.
• There is an optimal number of OSTs to distribute one MPI file for good I/O performance.

Numerical weather forecast is a most efficient means to reduce the effects of unexpected weather events. With the increasing prediction precision and the time-critical requirement, technologies of high performance computing have been improved much. However, I/O has become a significant performance bottleneck when scaling up to thousands of processes. In this paper we analyze the I/O access patterns of GRAPES (Global/Regional Assimilation and Prediction System) for numerical weather prediction system as a case of regular multi-dimensional data access. And we implement two parallel I/O strategies based on MPI-IO and ADIOS (Adaptive I/O System), making full use of efficient synchronous I/O schemes. For ADIOS, the “MPI_AMR” method is employed to improve the parallel output bandwidth, which uses aggregator processes to execute I/O operations and write to one subfile on one OST for each aggregators, reducing I/O conflicts. Experiments show that the two optimizations outperform the original sequential I/O access, achieving very impressive improvements on Tianhe-1A system and Subway Bluelight system in China. The I/O cost based on ADIOS only accounts for no more than 9% scaling up to 2K processes on Tianhe-1A system, while the sequential I/O costs more than 50% of total time when scaling to 1K or more processes. It is also found that the aggregate output based on ADIOS achieves better output performance improvements, whose peak reaches 3.84 GB/s with one time-step output on Tianhe-1A system. On the contrary, MPI-IO has obtained good input performance improvements, whose peak reaches 4.55 GB/s.We use the GRAPES’s I/O component as a benchmark to make a further study on I/O performance using ADIOS. From the rules found, we can design an efficient scheme of using “MPI_AMR” for ADIOS on Tianhe-1A system. We take 15-km horizontal resolution for instance. Since the maximum number of OSTs available for our test on Tianhe-1A system is no more than 80, 32 or 64 OSTs are chosen to facilitate parallel I/O. Then the number of aggregators should be set as 64 or 128. The optimal data size of 114 MB on one OST on Tianhe-1A system can be tested by simple cases. If we use 32 OSTs with 1024 processes, then 4 time-step aggregation can be calculated out, which obtains optimal I/O performance under such number of OSTs. It is true of the situation of 64 OSTs used. Hence, time-step aggregation is useful for output optimization based on “MPI_AMR”, whose peak reaches 7.69 GB/s on 2K processes with 64 OSTs and 128 aggregators if 8 time-step aggregation is used.We also examine the performance effects of data layout in the Lustre file system based on MPI-IO, which implies that data distribution on more OSTs outperforms a limited number of OSTs used, while the I/O performance is more likely to be disturbed with data distributed on most of all the OSTs. This influence is more apparent based on MPI-IO compared with ADIOS.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 37, July 2014, Pages 378–389

نویسندگان

Yinlong Zou, Wei Xue, Shenshen Liu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A case study of large-scale parallel I/O analysis and optimization for numerical weather prediction system

دسترسی سریع

ارتباط

English Website