Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4956521 | Journal of Systems and Software | 2017 | 15 Pages |
Abstract
Recent supercomputers rated in the TOP 500 list increasingly utilize accelerator or co-processor devices to improve performance and energy efficiency. Since version 4.0 of the specification OpenMP addresses this heterogeneity in computing with the target directives, which enable programmers to offload portions of the code to massively-parallel target devices. Due to this new complexity in hardware and software design, performance optimization of large-scale parallel programs becomes more and more challenging. As manual performance analysis is getting infeasible for complex high performance computing (HPC) codes, we propose an approach to automatically detect bottlenecks such as load imbalances in heterogeneous OpenMP applications. We developed a method to perform critical-path and root-cause analysis for the OpenMPÂ 4.0Â offloading model and integrated it into the tool CASITA. The post-mortem analysis is based on execution traces that are generated with an implementation of the evolving OpenMP Tools Interface into the measurement system Score-P. To validate the implementation of our method we ported several existing codes to OpenMPÂ 4.0Â , executed them with an Intel Xeon Phi as the target device and analyzed the resulting trace files.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Networks and Communications
Authors
Robert Dietrich, Felix Schmitt, Alexander Grund, Jonas Stolle,