کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
433024 | 689211 | 2014 | 16 صفحه PDF | دانلود رایگان |

• We propose a library for characterizing and load balancing irregular applications.
• We characterize, optimize four irregular applications and analyze the results.
• CP has the highest irregularity of 3.41 and the lowest utilization of 29.4%.
• The centralized task pool achieves 1.63× performance on average and up to 2.72×.
• Larger improvement is achievable under lower utilization and abundant parallelism.
While Graphics Processing Units (GPUs) show high performance for problems with regular structures, they do not perform well for irregular tasks due to the mismatches between irregular problem structures and SIMD-like GPU architectures. In this paper, we introduce a new library, CUIRRE, for improving performance of irregular applications on GPUs. CUIRRE reduces the load imbalance of GPU threads resulting from irregular loop structures. In addition, CUIRRE can characterize irregular applications for their irregularity, thread granularity and GPU utilization. We employ this library to characterize and optimize both synthetic and real-world applications. The experimental results show that a 1.63× on average and up to 2.76× performance improvement can be achieved with the centralized task pool approach in the library at a 4.57% average overhead with static loading ratios. To avoid the cost of exhaustive searches of loading ratios, an adaptive loading ratio method is proposed to derive appropriate loading ratios for different inputs automatically at runtime. Our task pool approach outperforms other load balancing schemes such as the task stealing method and the persistent threads method. The CUIRRE library can easily be applied on many other irregular problems.
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 10, October 2014, Pages 2951–2966