کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523878 | 868516 | 2015 | 20 صفحه PDF | دانلود رایگان |
• A model to estimate the performance of graph partitioning running on heterogeneous multi-core clusters is proposed.
• We discover pitfalls of conventional methodologies in obtaining model parameters from multi-core systems.
• The impact of intra-node contention is too significant to be ignored.
• Modeling accuracy depends on whether overlap is adequately considered.
• Characteristics of input meshes may affect memory access behavior and hence become a determinant factor.
Considering application behavior in graph partitioning is an arduous task because of the chicken-and-egg problem: the application behavior depends on how the graph is decomposed while achieving load balance requires the knowledge of how the application utilizes the underlying resources. Advances in multi-core processors further complicate the endeavor by introducing hardware diversity and intra-node contention. As an attempt to quantify performance for partitioning refinement, we propose a model that predicts execution times of iterative mesh-based applications running on heterogeneous multi-core clusters. Apart from considering resource heterogeneity, the model takes into account hierarchical communication characteristics, overlap between computation and communication, as well as performance penalties due to intra-node contention. We present a detailed methodology on how to obtain key parameters from a real system and highlight potential pitfalls of conventional approaches in obtaining the parameters. Experiments were conducted using a synthetic application benchmark solving a partial differential equation. Evaluation shows a good agreement between actual time measurement and the performance model.
Journal: Parallel Computing - Volume 46, July 2015, Pages 78–97