Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
486671 | Procedia Computer Science | 2012 | 9 Pages |
Abstract
This paper describes our progressindeveloping softwarefor performing parallelLUfactorizationofalarge dense matrix on a GPU cluster. Three approaches, with increasing software complexity, are considered: (i) a naive “thunking” approach that links the existing parallel ScaLAPACK software library with cuBLAS through a software emulation layer; (ii) a more intrusive magmaBLAS implementation integrated into the LU solver in the High-Performance Linpack software; and (iii) a left-looking out-of-core algorithm for solving problems that are larger than the available memory on GPUdevices. Comparisonof the performancegainsversus the current ScaLAPACK PZGETRF are provided.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)