کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431469 688555 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Optimized FFT computations on heterogeneous platforms with application to the Poisson equation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Optimized FFT computations on heterogeneous platforms with application to the Poisson equation
چکیده انگلیسی


• New strategy to decompose large multi-dimensional FFTs on CPU–GPU platforms.
• Executions of GPU kernels are almost completely overlapped with PCI bus transfer.
• Multi-dimensional data is transferred only once between the GPU and CPU.
• Scheme is equally effective for the single and double precision computations.

We develop optimized multi-dimensional FFT implementations on CPU–GPU heterogeneous platforms for the case when the input is too large to fit on the GPU global memory, and use the resulting techniques to develop a fast Poisson solver. The solver involves memory bound computations for which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between the GPU and the CPU such that the 3D data is transferred only once to the device memory, and the executions of the GPU kernels are almost completely overlapped with the PCI data transfer. We were able to achieve significantly better performance than what has been reported in previous related work, including over 145 GFLOPS for the three periodic boundary conditions (single precision version), and over 105 GFLOPS for the two periodic, one Neumann boundary conditions (single precision version). The effective bidirectional PCIe bus bandwidth achieved is 9–10 GB/s, which is close to the best possible on our platform. For all the cases tested, the single 3D data PCIe transfer time, which constitutes a lower bound on what is possible on our platform, takes almost 70% of the total execution time of the Poisson solver.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 8, August 2014, Pages 2745–2756
نویسندگان
, ,