| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 762579 | Computers & Fluids | 2012 | 6 Pages |
In this paper, a hybrid programming (OpenMP, MPI and CUDA) approach is used to study the performance of a parallelized Dynamic Discrete Ordinate Method (DDOM) solver [1]. The parallel computation performances were compared under different scenarios. A hybrid parallelism of MPI and OpenMP performs well in terms of parallel efficiency (>90%) on a 64 core CPU cluster without using any load-balancing technique. This hybrid parallelism model is extended to a GPU cluster. By using massive multicore GPUs, the CUDA-accelerated code achieves a speed 250 times faster with a single GPU and over 780 times faster with a Quad-GPU cluster versus the identical process running on a single thread of CPU. Our results demonstrate that DDOM solver provides good scalability on CPU and GPU clusters.
