Article ID Journal Published Year Pages File Type
486671 Procedia Computer Science 2012 9 Pages PDF
Abstract

This paper describes our progressindeveloping softwarefor performing parallelLUfactorizationofalarge dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation integrated into the LU solver in the High-Performance Linpack software; and (iii) a left-looking out-of-core algorithm for solving problems that are larger than the available memory on GPUdevices. Comparisonof the performancegainsversus the current ScaLAPACK PZGETRF are provided.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)