Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
524677	868824	2011	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Code optimization - بهینه سازی کد particle-in-cell - ذره در سلول Fermi - فرمی Manycore - مانیکور graphic processing units - واحد پردازش گرافیکی Multicore - چندگانه

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

چکیده انگلیسی

The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this work, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC’s key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broad range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3–4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.

Research highlights
► We investigate optimization of GTC’s two principal phases (charge and push).
► Charge is a stream plus scatter-add kernel while push is a stream plus gather kernel.
► Studied platforms are multicore CPUs and NVIDIA GPUs.
► Performance requires specialized data synchronization, data replication, and data locality.
► Optimized multicore CPU performance can surpass optimized GPU performance.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 37, Issue 9, September 2011, Pages 501–520

نویسندگان

Kamesh Madduri, Eun-Jin Im, Khaled Z. Ibrahim, Samuel Williams, Stéphane Ethier, Leonid Oliker,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms

دسترسی سریع

ارتباط

English Website