Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100

Article ID	Journal	Published Year	Pages	File Type
6930467	Journal of Computational Physics	2016	16 Pages	PDF

Abstract

We demonstrate significant speed-ups of â100Ã, arising from a combination of algorithmic improvements and architecture-aware optimisations targeted at improving thread and vectorisation behaviour. The resulting MPI/OpenMP hybrid code is capable of executing on clusters containing processors and/or coprocessors, with strong-scaling efficiency of 98.6% on up to 16 nodes. We find that a single coprocessor outperforms two processor sockets by a factor of 1.3Ã and that running the same code across a combination of both microarchitectures improves performance-per-node by a factor of 3.38Ã. By making bispectrum calculations competitive with those for the power spectrum (or two-point correlator) we are now able to consider joint analysis for cosmological science exploitation of new data.

Keywords

Xeon Phi Many-core Nested parallelism Cosmology