Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
455223	695350	2015	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Multithreading Particle-in-cell simulation - شبیه سازی ذره در سلول High-performance computing - محاسبات با کارایی بالا

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

چکیده انگلیسی

• Toughness of inter-node, intra-node and intra-core parallelism is discussed.
• Story change by manycore processors is exemplified by PIC simulation code.
• Manycore- and SIMD-aware implementation improves the performance 10-fold.

This paper discusses the challenge in post-Peta and Exascale era especially that brought by manycore processors of ordinary (i.e., non-GPU type) CPU cores. Though such a processor like Intel Xeon Phi gives us TFlops-class computational power and may lead us to Exascale computing, full exploitation of its potential is far from an easy job due to its source of high performance, namely a large scale multithreading and a wide SIMD mechanism. In fact, in the three-tier parallelism namely inter-node, intra-node and intra-core ones, we found their order does not represent the toughness in HPC programming but the order should be reversed to do that. Our case study with a particle-in-cell plasma simulation code supports our observation revealing that a simple porting of an existing code to Xeon Phi is infeasible from the viewpoint of performance and we have to make a significant change of the code structure so that it conforms with the features of the processor. However the study also confirms that the recoding effort is well rewarded achieving a good single-node performance higher than that obtained from an execution on four dual-socket nodes of Cray XE6.

Figure optionsDownload as PowerPoint slide

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Electrical Engineering - Volume 46, August 2015, Pages 81–94

نویسندگان

Hiroshi Nakashima,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

دسترسی سریع

ارتباط

English Website