Article ID Journal Published Year Pages File Type
1564130 Computational Materials Science 2008 8 Pages PDF
Abstract

We suggest and implement a parallelization scheme based on an efficient multiband eigenvalue solver, called the locally optimal block preconditioned conjugate gradient (lobpcg) method, and using an optimized three-dimensional (3D) fast Fourier transform (FFT) in the ab initio plane-wave code abinit. In addition to the standard data partitioning over processors corresponding to different k-points, we introduce data partitioning with respect to blocks of bands as well as spatial partitioning in the Fourier space of coefficients over the plane waves basis set used in abinit. This k-points-multiband-FFT parallelization avoids any collective communications on the whole set of processors relying instead on one-dimensional communications only. For a single k-point, super-linear scaling is achieved for up to 100 processors due to an extensive use of hardware-optimized blas, lapack and scalapack routines, mainly in the lobpcg routine. We observe good performance up to 200 processors. With 10 k-points our three-way data partitioning results in linear scaling up to 1000 processors for a practical system used for testing.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics
Authors
, , , ,