Article ID Journal Published Year Pages File Type
6930467 Journal of Computational Physics 2016 16 Pages PDF
Abstract
We demonstrate significant speed-ups of ≈100×, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3× and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38×. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , , ,