Article ID Journal Published Year Pages File Type
768711 Computers & Fluids 2012 13 Pages PDF
Abstract

We present a memory-efficient and parallel framework for finite element operator application implemented in the generic open-source library deal.II. Instead of assembling a sparse matrix and using it for matrix–vector products, the operation is applied by cell-wise quadrature. The evaluation of shape functions is implemented with a sum-factorization approach. Our implementation is parallelized on three levels to exploit modern supercomputer architecture in an optimal way: MPI over remote nodes, thread parallelization with dynamic task scheduling within the nodes, and explicit vectorization for utilizing processors’ vector units. Special data structures are designed for high performance and to keep the memory requirements to a minimum. The framework handles adaptively refined meshes and systems of partial differential equations. We provide performance tests for both linear and nonlinear PDEs which show that our cell-based implementation is faster than sparse matrix–vector products for polynomial order two and higher on hexahedral elements and yields ten times higher Gflops rates.

► Implementation framework for finite element operator application. ► Efficient data structures for high performance, including sum-factorization. ► Hybrid parallelization including MPI, shared memory, and vectorization. ► Operator application reaches up to 70% of system’s peak performance. ► Framework outperforms sparse matrix–vector products for element order two and higher.

Related Topics
Physical Sciences and Engineering Engineering Computational Mechanics
Authors
, ,