Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6930467 | Journal of Computational Physics | 2016 | 16 Pages |
Abstract
We demonstrate significant speed-ups of â100Ã, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3Ã and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38Ã. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
J.P. Briggs, S.J. Pennycook, J.R. Fergusson, J. Jäykkä, E.P.S. Shellard,