کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432377 688869 2013 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Generating data transfers for distributed GPU parallel programs
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Generating data transfers for distributed GPU parallel programs
چکیده انگلیسی


• We automatically generate heterogeneous communications for distributed-memory architectures.
• Communication generation is based on static compiler analysis and runtime decisions.
• Accurate heterogeneous communications are generated for regular applications.
• Heterogeneous communications deal with accelerator-based GPU data transfers and message-passing for transfers between CPUs.

Nowadays, high performance applications exploit multiple level architectures, due to the presence of hardware accelerators like GPUs inside each computing node. Data transfers occur at two different levels: inside the computing node between the CPU and the accelerators and between computing nodes. We consider the case where the intra-node parallelism is handled with HMPP compiler directives and message-passing programming with MPI is used to program the inter-node communications. This way of programming on such an heterogeneous architecture is costly and error-prone. In this paper, we specifically demonstrate the transformation of HMPP programs designed to exploit a single computing node equipped with a GPU into an heterogeneous HMPP + MPI exploiting multiple GPUs located on different computing nodes.The STEP tool focuses on generating communications combining both powerful static analyses and runtime execution to reduce the volume of communications. Our source-to-source transformation is implemented inside the PIPS workbench. We detail the generated source program of the Jacobi kernel and show that the execution times and speedups are encouraging. At last we give some directions for the improvement of the tool.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 12, December 2013, Pages 1649–1660
نویسندگان
, , ,