Optimization Techniques for 3D-FWT on Systems with Manycore GPUs and Multicore CPUs

Article ID	Journal	Published Year	Pages	File Type
490505	Procedia Computer Science	2013	10 Pages	PDF

Abstract

Programming manycore GPUs or multicore CPUs for high performance requires a careful balance of several hardware specific related factors, which is typically achieved by expert users through trial and error. To reduce the amount of hand-made optimization time required to achieve optimal performance, general guidelines can be followed or different metrics can be considered to predict performance, but ultimately a trial and error process is still prevalent. In this paper, we present an optimization method to run the 3D-Fast Wavelet Transform (3D-FWT) on hybrid systems. The optimization engine detects the different platforms found on a system, executing the appropriate kernel, implemented in both CUDA or OpenCL for GPUs, and programmed with pthreads for a CPU. Moreover, the proposed method selects automatically parameters such as the block size, the work-group size or the number of threads for reducing the execution time, obtaining the optimal performance in many cases. Finally, the optimization engine sends proportionally different parts of a video sequence to run concurrently in all platforms of the system. Speedups with respect to a normal user, who sends all frames to a GPU with a version of the 3D-FWT implemented in CUDA or OpenCL, presents an averaged gains of up to 7.93.