کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
10358587 | 868543 | 2014 | 13 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
A comparison of CPU and GPU implementations for solving the Convection Diffusion equation using the local Modified SOR method
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
In this paper we study a parallel form of the SOR method for the numerical solution of the Convection Diffusion equation suitable for GPUs using CUDA. To exploit the parallelism offered by GPUs we consider the fine grain parallelism model. This is achieved by considering the local relaxation version of SOR. More specifically, we use SOR with red-black ordering using two sets of parameters Ï1ij and Ï2ij for the 5 point stencil. The parameter Ï1ij is associated with each red (i + j even) grid point (i,j), whereas the parameter Ï2ij is associated with each black (i+j odd) grid point (i,j). The use of a parameter for each grid point avoids the global communication required in the adaptive determination of the best value of Ï and also increases the convergence rate of the SOR method (Varga, 1962) [38] and (Young, 1971) [41]. We present our strategy and the results of our effort to exploit the computational capabilities of GPUs under the CUDA environment. Additionally, two parallel CPU programs utilizing manual SSE2 (Streaming SIMD Extensions 2) and AVX (Advanced Vector Extensions) vectorization were developed as performance references. The optimizations applied on the GPU version were also considered for the CPU version. Significant performance improvement was achieved with all three developed GPU kernels differentiated by the degree of recomputations thus affecting the flops per element access ratio.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 40, Issue 7, July 2014, Pages 173-185
Journal: Parallel Computing - Volume 40, Issue 7, July 2014, Pages 173-185
نویسندگان
Yiannis Cotronis, Elias Konstantinidis, Maria A. Louka, Nikolaos M. Missirlis,