کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432361 | 688865 | 2014 | 11 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU](/preview/png/432361.png)
• We present a parallel method of the forward/backward substitutions on the GPU.
• A new kernel of SpMV with high optimization on the GPU is proposed.
• We suggest a new kernel of inner product on the GPU.
• An efficient MIC preconditioned conjugate gradient algorithm on the GPU is presented.
In this study, we discover the parallelism of the forward/backward substitutions (FBS) for two cases and thus propose an efficient preconditioned conjugate gradient algorithm with the modified incomplete Cholesky preconditioner on the GPU (GPUMICPCGA). For our proposed GPUMICPCGA, the following are distinct characteristics: (1) the vector operations are optimized by grouping several vector operations into single kernels, (2) a new kernel of inner product and a new kernel of the sparse matrix–vector multiplication with high optimization are presented, and (3) an efficient parallel implementation of FBS on the GPU (GPUFBS) for two cases are suggested. Numerical results show that our proposed kernels outperform the corresponding ones presented in CUBLAS or CUSPARSE, and GPUFBS is almost 3 times faster than the implementation of FBS using the CUSPARSE library. Furthermore, GPUMICPCGA has better behavior than its counterpart implemented by the CUBLAS and CUSPARSE libraries.
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 2, February 2014, Pages 2088–2098