کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523875 | 868516 | 2015 | 18 صفحه PDF | دانلود رایگان |

• We present a performance model for representing and predicting cost of parallel algorithms.
• Model goal is to help in the design and optimization of parallel collective algorithms.
• It is applied to collective algorithms in mainstream MPI implementations in shared memory.
• Model is compared to other well known and established models.
Formal modeling of the cost of MPI primitives allows a machine independent representation, comparison and performance analysis of their underlying algorithms. Current accepted methods are all the off-springs of LogP, conceived to model the cost of inter-node point-to-point messages in networks of single-processor machines. As new supercomputers are built upon cheap commodity boards with a growing number of cores accessing hierarchical memories, intra-node communication becomes progressively more relevant. Techniques for shared memory communication, such as message segmentation and collectives, not based on point-to-point operations, are substantively different from their inter-node counterparts. This paper unveils the reasons for the poor fit of LogGP and the most recent models in this domain, lognP and mlognP, and proposes a new model named ττ-Lop, rooted on them, but addressing the challenge of accurately modeling shared memory MPI communications. Broadcast algorithms of mainstream MPI implementations, MPICH and Open MPI, are modeled and analyzed.
Journal: Parallel Computing - Volume 46, July 2015, Pages 14–31