کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432392 | 688876 | 2013 | 14 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: Efficient breadth first search on multi-GPU systems Efficient breadth first search on multi-GPU systems](/preview/png/432392.png)
• Simple distributed BFS algorithms lead load unbalance among threads.
• Communication among tasks quickly becomes an issue.
• We propose a novel technique for mapping threads to data.
• We perform a pruning operation on the set of edges exchanged at each BFS level.
• Our implementation can efficiently traverse graphs with billions of nodes.
Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes.
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 9, September 2013, Pages 1292–1305