کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
431816 688634 2013 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Trellis: Portability across architectures with a high-level framework
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Trellis: Portability across architectures with a high-level framework
چکیده انگلیسی


• Trellis shows programmability benefits of a common and portable set of directives.
• We illustrate descriptive capability of directives that can support portable codes.
• We enhance the OpenACC model with more efficient mapping and synchronization.
• We implement prototype source translation of Trellis to OpenMP, OpenACC and CUDA.

The increasing computational needs of parallel applications inevitably require portability across parallel architectures, which now include heterogeneous processing resources, such as CPUs and GPUs, and multiple SIMD/SIMT widths. However, the lack of a common parallel programming paradigm that provides predictable, near-optimal performance on each resource leads to the use of low-level frameworks with architecture-specific optimizations, which in turn cause the code base to diverge and makes porting difficult. Our experiences with parallel applications and frameworks lead us to the conclusion that achieving performance portability requires a common set of high-level directives and efficient mapping onto each architecture.In order to demonstrate this concept, we develop Trellis, a prototype programming framework that allows the programmer to maintain only a single generic and structured codebase that executes efficiently on both the CPU and the GPU. Our approach annotates such code with a single set of high-level directives, derived from both OpenMP and OpenACC, that is made compatible for both architectures. Most importantly, motivated by the limitations of the OpenACC compiler in transforming such code into a GPU kernel, we introduce a thread synchronization directive and a set of transformation techniques that allow us to obtain the GPU code with the desired parallelization that yields more optimal performance.While a common high-level programming framework for both CPU and GPU is not yet available, our analysis shows that even obtaining the best-case GPU performance with OpenACC, state-of-the-art solution, requires modifications to the structure of codes to properly exploit braided parallelism, and cope with conditional statements or serial sections. While this already requires prior knowledge of compiler behavior the optimal performance is still unattainable due to the lack of synchronization. We describe the contributions of Trellis in addressing these problems by showing how it can achieve correct parallelization of the original codes for three parallel applications, with performance competitive to that of OpenMP and CUDA, improved programmability and reduced overall code length.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 10, October 2013, Pages 1400–1413
نویسندگان
, , , ,