Highly scalable implementation of an NN-body code on a GPU cluster

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
502651	863714	2013	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

GPGPU Performance tuning - تنظیم عملکرد Performance modeling - مدل سازی عملکرد CUDA - کودا. پردازش موازی و مدل برنامه‌نویسی

موضوعات مرتبط

مهندسی و علوم پایه شیمی شیمی تئوریک و عملی

پیش نمایش صفحه اول مقاله

Highly scalable implementation of an NN-body code on a GPU cluster

چکیده انگلیسی

We have developed a highly optimized code for collisionless NN-body calculations based on direct summation. Our new optimization hides the global memory access latency, and the resulting CUDA code has a peak performance of 1006.7 GFlop/s in single precision (assuming 26 floating-point operations per interaction) with a single NVIDIA Tesla M2090 board. To improve the scalability of the OpenMP/MPI hybrid parallelized code, we have reduced the number of communications among multiple GPUs and have overlapped communications with computations to hide communication time. The code’s performance was measured on the HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences), a recently installed GPGPU cluster at University of Tsukuba. The results show excellent scalability with superlinear scaling when the number of NN-body particles per GPU is less than 104 and parallel efficiency approaching unity when the number of NN-body particles per GPU is greater than 104. The CUDA/OpenMP/MPI code has a peak performance of 255.5 TFlop/s when 256 NVIDIA Tesla M2090 boards are used, which is 75.0% of the theoretical peak performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Physics Communications - Volume 184, Issue 9, September 2013, Pages 2159–2168

نویسندگان

Yohei Miki, Daisuke Takahashi, Masao Mori,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Highly scalable implementation of an NN-body code on a GPU cluster

دسترسی سریع

ارتباط

English Website