کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
502463 863706 2010 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی تئوریک و عملی
پیش نمایش صفحه اول مقاله
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
چکیده انگلیسی

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications.Program summaryProgram title: SWsolverCatalogue identifier: AEGY_v1_0Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: GPL v3No. of lines in distributed program, including test data, etc.: 59 168No. of bytes in distributed program, including test data, etc.: 453 409Distribution format: tar.gzProgramming language: C, CUDAComputer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator.Operating system: LinuxHas the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs.RAM: Tested on Problems requiring up to 4 GB per compute node.Classification: 12External routines: MPI, CUDA, IBM Cell SDKNature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA.Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster.Additional comments: Sub-program numdiff is used for the test run.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Physics Communications - Volume 181, Issue 12, December 2010, Pages 2164–2179
نویسندگان
, ,