Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
455223 | Computers & Electrical Engineering | 2015 | 14 Pages |
•Toughness of inter-node, intra-node and intra-core parallelism is discussed.•Story change by manycore processors is exemplified by PIC simulation code.•Manycore- and SIMD-aware implementation improves the performance 10-fold.
This paper discusses the challenge in post-Peta and Exascale era especially that brought by manycore processors of ordinary (i.e., non-GPU type) CPU cores. Though such a processor like Intel Xeon Phi gives us TFlops-class computational power and may lead us to Exascale computing, full exploitation of its potential is far from an easy job due to its source of high performance, namely a large scale multithreading and a wide SIMD mechanism. In fact, in the three-tier parallelism namely inter-node, intra-node and intra-core ones, we found their order does not represent the toughness in HPC programming but the order should be reversed to do that. Our case study with a particle-in-cell plasma simulation code supports our observation revealing that a simple porting of an existing code to Xeon Phi is infeasible from the viewpoint of performance and we have to make a significant change of the code structure so that it conforms with the features of the processor. However the study also confirms that the recoding effort is well rewarded achieving a good single-node performance higher than that obtained from an execution on four dual-socket nodes of Cray XE6.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide