Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
490505 | Procedia Computer Science | 2013 | 10 Pages |
Programming manycore GPUs or multicore CPUs for high performance requires a careful balance of several hardware specific related factors, which is typically achieved by expert users through trial and error. To reduce the amount of hand-made optimization time required to achieve optimal performance, general guidelines can be followed or different metrics can be considered to predict performance, but ultimately a trial and error process is still prevalent. In this paper, we present an optimization method to run the 3D-Fast Wavelet Transform (3D-FWT) on hybrid systems. The optimization engine detects the different platforms found on a system, executing the appropriate kernel, implemented in both CUDA or OpenCL for GPUs, and programmed with pthreads for a CPU. Moreover, the proposed method selects automatically parameters such as the block size, the work-group size or the number of threads for reducing the execution time, obtaining the optimal performance in many cases. Finally, the optimization engine sends proportionally different parts of a video sequence to run concurrently in all platforms of the system. Speedups with respect to a normal user, who sends all frames to a GPU with a version of the 3D-FWT implemented in CUDA or OpenCL, presents an averaged gains of up to 7.93.