کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
507149 865097 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Accelerating DynEarthSol3D on tightly coupled CPU–GPU heterogeneous processors
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Accelerating DynEarthSol3D on tightly coupled CPU–GPU heterogeneous processors
چکیده انگلیسی


• We accelerate Dynamic Earth Solution 3D program on CPU–GPU heterogeneous processors.
• We propose data transformation to improve GPU memory performance.
• We propose to merge kernels to minimize kernel launch overhead.
• We show performance gain over implementations on discrete GPU and multi-core CPU.

DynEarthSol3D (Dynamic Earth Solver in Three Dimensions) is a flexible, open-source finite element solver that models the momentum balance and the heat transfer of elasto-visco-plastic material in the Lagrangian form using unstructured meshes. It provides a platform for the study of the long-term deformation of earth's lithosphere and various problems in civil and geotechnical engineering. However, the continuous computation and update of a very large mesh poses an intolerably high computational burden to developers and users in practice. For example, simulating a small input mesh containing around 3000 elements in 20 million time steps would take more than 10 days on a high-end desktop CPU. In this paper, we explore tightly coupled CPU–GPU heterogeneous processors to address the computing concern by leveraging their new features and developing hardware-architecture-aware optimizations. Our proposed key optimization techniques are three-fold: memory access pattern improvement, data transfer elimination and kernel launch overhead minimization. Experimental results show that our proposed implementation on a tightly coupled heterogeneous processor outperforms all other alternatives including traditional discrete GPU, quad-core CPU using OpenMP, and serial implementations by 67%, 50%, and 154% respectively even though the embedded GPU in the heterogeneous processor has significantly less number of cores than high-end discrete GPU.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Geosciences - Volume 79, June 2015, Pages 27–37
نویسندگان
, , , , ,