کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432144 688719 2008 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Fine-grained parallelization of lattice QCD kernel routine on GPUs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Fine-grained parallelization of lattice QCD kernel routine on GPUs
چکیده انگلیسی

Simulation time for the classical problem of Lattice Quantum Chromodynamics (Lattice QCD) is dominated by one kernel routine responsible for computing the actions of a Dirac operator. This paper describes an experience in parallelizing this kernel routine. We explore parallelization granularities for this kernel routine on Graphical Processing Units (GPUs). We show that fine-grained parallelism can outperform coarse-grained parallelization, given that control-flow and communication effects are minimized. We propose two techniques for transforming control-flow-based code to control-free code. We also show how to reduce the communication effect by optimizing for commonly used sequences of calls to this routine. In our implementation on NVIDIA 8800 GTX, we were able to achieve an 8.3x speedup over an SSE2 optimized version on 2.8 GHz Intel Xeon CPU.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 68, Issue 10, October 2008, Pages 1350–1359
نویسندگان
, , ,