Transforming the multifluid PPM algorithm to run on GPUs

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
432654	689006	2016	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Code transformation GPU computation Computational fluid dynamics - دینامیک سیالات محاسباتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

Transforming the multifluid PPM algorithm to run on GPUs

چکیده انگلیسی

• An optimization for limited workspace on the GPUs.
• Allowing trade-off between workspace size and redundant computation.
• Automatic translators to automate the optimizations.
• Delivered 1.7 to 2.4 times speedups compared to the CPU systems.
• Superior or comparable performance compared to other CFDs running on GPUs.

In the past several years, there has been much success in adapting numerical algorithms involving linear algebra and pairwise N-body force calculations to run well on GPUs. These numerical algorithms share the feature that high computational intensity can be achieved while holding only small amounts of data in on-chip storage. In previous work, we combined a briquette data structure and a heavily pipelined CFD processing of these data briquettes in sequence that results in a very small on-chip data workspace and high performance for our multifluid PPM gas dynamics algorithm on CPUs with standard sized caches. The on-chip data workspace produced in that earlier work is not small enough to meet the requirements of today’s GPUs, which demand that no more than 32 kB of on-chip data be associated with a single thread of control (a warp). Here we report a variant of our earlier technique that allows a user-controllable trade-off between workspace size and redundant computation that can be a win on GPUs. We use our multifluid PPM gas dynamics algorithm to illustrate this technique. Performance results for this algorithm in 32-bit precision on a recently introduced dual-chip GPU, the Nvidia K80, are 1.7 times that on a similarly recent dual CPU node using two 16-core Intel Haswell chips. The redundant computation that allows the on-chip data context for each thread of control to be less than 32 kB is roughly 9% of the total. We have built an automatic translator from a Fortran expression to CUDA to ease the programming burden that is involved in applying our technique.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volumes 93–94, July 2016, Pages 56–65

نویسندگان

Pei-Hung Lin, Paul R. Woodward,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Transforming the multifluid PPM algorithm to run on GPUs

دسترسی سریع

ارتباط

English Website