کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523836 | 868503 | 2016 | 11 صفحه PDF | دانلود رایگان |

• Specialized implementations of ILUPACK’s iterative solver for NUMA platforms.
• Specialized implementations of ILUPACK’s iterative solver for many-core accelerators.
• Exploitation of task parallelism via OmpSs runtime (dynamic schedule).
• Exploitation of task parallelism via MPI (static schedule).
• Exploitation of data parallelism for GPUs.
We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.
Journal: Parallel Computing - Volume 54, May 2016, Pages 97–107