کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566260 875958 2011 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزار
پیش نمایش صفحه اول مقاله
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
چکیده انگلیسی

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an upper limit for the flow solver performance. We discuss the GPU-specific implementation of the solver with a focus on memory alignment and register shortage. The optimized code is up to an order of magnitude faster than standard two-socket x86 servers with AMD Barcelona or Intel Nehalem CPUs. We further analyze data transfer rates for the PCI-express bus to evaluate the potential benefits of multi-GPU parallelism in a cluster environment.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Advances in Engineering Software - Volume 42, Issue 5, May 2011, Pages 266–272
نویسندگان
, , , ,