کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
768452 1462717 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
High order accurate simulation of compressible flows on GPU clusters over Software Distributed Shared Memory
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مکانیک محاسباتی
پیش نمایش صفحه اول مقاله
High order accurate simulation of compressible flows on GPU clusters over Software Distributed Shared Memory
چکیده انگلیسی


• We evaluate shared memory abstraction on GPU clusters with a CFD application.
• A two-level hierarchical domain decomposition takes place on a structured grid.
• The application code corresponds to a fully 3D high order accurate WENO scheme.
• Two implementation schemes are proposed based on cluster-enabled OpenMP and Java.
• The proposed schemes are compared against MPI and CUDA versions.

The advent of multicore processors during the past decade and especially the recent introduction of many-core Graphics Processing Units (GPUs) open new horizons to large-scale, high-resolution simulations for a broad range of scientific fields. Residing at the forefront of advancements in multiprocessor technology, GPUs are often chosen as co-processors when intensive parts of applications need to be computed. Among the various domains, the scientific area of Computational Fluid Dynamics (CFD) is a potential candidate that could significantly benefit from the utilization of many-core GPUs. In order to investigate this possibility, we herein evaluate the performance of a high order accurate method for the simulation of compressible flows.Targeting computer systems with multiple GPUs, the current implementation and the respective performance evaluation are taking place on a GPU cluster. With respect to using these GPUs, this paper offers an alternative to the mainstream approach of message passing by considering shared memory abstraction. In the implementations presented in this paper, the updates on shared data are not explicitly coded by the programmer across the simulation phases, but are propagated through Software Distributed Shared Memory (SDSM). This way, we intend to preserve a unified memory view that extends the memory hierarchy from the node level to the cluster level. Such an extension could significantly facilitate the porting of multithreaded codes at GPU clusters. Our results indicate that the presented approach is competitive with the message passing paradigm and they lay grounds for further research on the use of shared memory abstraction for future GPU clusters.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Fluids - Volume 93, 10 April 2014, Pages 18–29
نویسندگان
, , ,