کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432444 688896 2012 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fat-tree routing and node ordering providing contention free traffic for MPI global collectives
چکیده انگلیسی

As the size of High Performance Computing clusters grows, so does the probability of interconnect hot spots that degrade the latency and effective bandwidth the network provides. This paper presents a solution to this scalability problem for real life constant bisectional-bandwidth fat-tree topologies. It is shown that maximal bandwidth and cut-through latency can be achieved for MPI global collective traffic. To form such a congestion-free configuration, MPI programs should utilize collective communication, MPI-node-order should be topology aware, and the packet routing should match the MPI communication patterns. First, we show that MPI collectives can be classified into unidirectional and bidirectional shifts. Using this property, we propose a scheme for congestion-free routing of the global collectives in fully and partially populated fat trees running a single job. The no-contention result is then obtained for multiple jobs running on the same fat-tree by applying some job size and placement restrictions. Simulation results of the proposed routing, MPI-node-order and communication patterns show no contention which provides a 40% throughput improvement over previously published results for all-to-all collectives.


► All known medium or large message MPI collectives use fixed displacement permutations.
► XGFT fat-trees are extended to enable parallel ports between switches, thereby defining PGFT.
► D-Mod-K routing for PGFTs is formulated.
► D-Mod-K is proven as contention-free for all MPI global collectives for PGFT single job.
► Concurrent jobs have the above property if their sizes and placement meet some rules.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 72, Issue 11, November 2012, Pages 1423–1432
نویسندگان
,