Article ID Journal Published Year Pages File Type
523836 Parallel Computing 2016 11 Pages PDF
Abstract

•Specialized implementations of ILUPACK’s iterative solver for NUMA platforms.•Specialized implementations of ILUPACK’s iterative solver for many-core accelerators.•Exploitation of task parallelism via OmpSs runtime (dynamic schedule).•Exploitation of task parallelism via MPI (static schedule).•Exploitation of data parallelism for GPUs.

We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , , , , ,