Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
523836 | Parallel Computing | 2016 | 11 Pages |
•Specialized implementations of ILUPACK’s iterative solver for NUMA platforms.•Specialized implementations of ILUPACK’s iterative solver for many-core accelerators.•Exploitation of task parallelism via OmpSs runtime (dynamic schedule).•Exploitation of task parallelism via MPI (static schedule).•Exploitation of data parallelism for GPUs.
We present specialized implementations of the preconditioned iterative linear system solver in ILUPACK for Non-Uniform Memory Access (NUMA) platforms and many-core hardware co-processors based on the Intel Xeon Phi and graphics accelerators. For the conventional x86 architectures, our approach exploits task parallelism via the OmpSs runtime as well as a message-passing implementation based on MPI, respectively yielding a dynamic and static schedule of the work to the cores, with different numeric semantics to those of the sequential ILUPACK. For the graphics processor we exploit data parallelism by off-loading the computationally expensive kernels to the accelerator while keeping the numeric semantics of the sequential case.