Manycore challenge in particle-in-cell simulation: How to exploit 1 TFlops peak performance for simulation codes with irregular computation

Article ID	Journal	Published Year	Pages	File Type
455223	Computers & Electrical Engineering	2015	14 Pages	PDF

Abstract

•Toughness of inter-node, intra-node and intra-core parallelism is discussed.•Story change by manycore processors is exemplified by PIC simulation code.•Manycore- and SIMD-aware implementation improves the performance 10-fold.

This paper discusses the challenge in post-Peta and Exascale era especially that brought by manycore processors of ordinary (i.e., non-GPU type) CPU cores. Though such a processor like Intel Xeon Phi gives us TFlops-class computational power and may lead us to Exascale computing, full exploitation of its potential is far from an easy job due to its source of high performance, namely a large scale multithreading and a wide SIMD mechanism. In fact, in the three-tier parallelism namely inter-node, intra-node and intra-core ones, we found their order does not represent the toughness in HPC programming but the order should be reversed to do that. Our case study with a particle-in-cell plasma simulation code supports our observation revealing that a simple porting of an existing code to Xeon Phi is infeasible from the viewpoint of performance and we have to make a significant change of the code structure so that it conforms with the features of the processor. However the study also confirms that the recoding effort is well rewarded achieving a good single-node performance higher than that obtained from an execution on four dual-socket nodes of Cray XE6.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

Multithreading Particle-in-cell simulation High-performance computing