کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523781 868491 2015 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Time-domain BEM for the wave equation on distributed-heterogeneous architectures: A blocking approach
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Time-domain BEM for the wave equation on distributed-heterogeneous architectures: A blocking approach
چکیده انگلیسی


• We parallelize the TD-BEM on clusters of multicore nodes enhanced with multiple GPU.
• SpMV is replaced by a more efficient operator by reordering the computation.
• GPU kernels achieve high Flop-rate using blocking schemes.
• Idle time is drastically reduced by using a greedy balancing algorithm.

The problem of time-domain BEM for the wave equation in acoustics and electromagnetism can be expressed as a sparse linear system composed of multiple interaction/convolution matrices. It can be solved by using sparse matrix-vector products which are inefficient to achieve high Flop-rate neither on CPUs nor GPUs. In this paper we extend the approach proposed in a previous work [1] in which we re-order the computation to get a special matrix structure with one dense vector per row. This new structure is called a slice matrix and is computed with a custom matrix/vector product operator. In this study, we present an optimized implementation of this operator on Nvidia GPUs based on two blocking strategies. We explain how we can obtain multiple block-values from a slice and how these can be computed efficiently on GPUs since we target heterogeneous nodes composed of CPUs and GPUs. In order to deal with different efficiencies of the processing units we use a greedy heuristic that dynamically balances work among the workers. We demonstrate the performance of our system by studying the quality of the balancing heuristic and the sequential Flop-rate of the blocked implementations. Finally, we validate our implementation with an industrial test case on 8 heterogeneous nodes, each composed of 12 CPUs and 3 GPUs.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Parallel Computing - Volume 49, November 2015, Pages 66–82
نویسندگان
, , ,