کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
523944 | 868533 | 2011 | 23 صفحه PDF | دانلود رایگان |

A hybrid message passing and shared memory parallelization technique is presented for improving the scalability of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary regular grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the regular grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the solution of the combined-field integral equation pertinent to the analysis of time-harmonic electromagnetic scattering from perfectly conducting surfaces. The scalability of the scheme is investigated theoretically and verified on a state-of-the-art multi-core cluster for benchmark scattering problems. Timing and speedup results on up to 1024 quad-core processors show that the hybrid MPI/OpenMP parallelization of AIM exhibits better strong scalability (fixed problem size speedup) than pure MPI parallelization of it when multiple cores are used on each processor.
► Scalability of classical iterative MOM and fast AIM analyzed on multi-core clusters.
► Effectiveness of hybrid MPI/OpenMP parallelization is investigated theoretically.
► A hybrid parallelization of AIM based on a nested 1-D grid decomposition is proposed.
► Hybrid parallelization always improves the scalability of AIM memory requirement.
► It also improves the scalability of AIM matrix-solve time if it is latency limited.
► Practical performance of hybrid parallelization of MOM and AIM are demonstrated.
► Scattering from three benchmark structures analyzed on multi-core cluster Ranger.
► Simulations are conducted for up to N ∼ 10 ∧ 7 degrees of freedom on up to 4096 cores.
► Hybrid parallelization is shown to alleviate memory and communication limitations.
► Results in line with theoretical analysis.
Journal: Parallel Computing - Volume 37, Issues 6–7, June–July 2011, Pages 279–301