کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4968211 | 1449561 | 2017 | 21 صفحه PDF | دانلود رایگان |
- We show that message pipelining can be used to construct multicast operations.
- Using multicast operations we construct an extension of recursive doubling.
- We show that recursive multiplying outperforms recursive doubling significantly.
- We created a simulator to understand better the connection between the theoretical exploration and experimental results.
The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2â¯N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling. Using a custom simulator we further explore the dynamics of recursive multiplying.
Journal: Parallel Computing - Volume 69, November 2017, Pages 24-44