FIPIP: A novel fine-grained parallel partition based intra-frame prediction on heterogeneous many-core systems

Article ID	Journal	Published Year	Pages	File Type
4950279	Future Generation Computer Systems	2018	14 Pages	PDF

Abstract

â¢A fine-grained parallelism for intra prediction based on GPU is proposed.â¢It is the first to promote intra prediction to pixel-level parallelism based on GPU.â¢A new regular prediction formula is presented for parallelism.â¢Two optimized encoding orders are adopted for multi-levels parallelism.â¢An efficient self-synchronizing method is presented for task scheduling.

Intra-frame prediction is an important time-consuming component of the widely used H.264/AVC encoder. To speed up prediction, one promising direction is to introduce parallelism and there have been many heterogeneous many-core based approaches proposed. But most of these approaches are limited by their use of highly irregular prediction formulas, which require significant amount of branch instructions. They only use coarse-grained parallel partition, which considers blocks or sub-region of images as parallel processing units. In this paper, by contrast, we propose a fine-grained intra-frame prediction approach based on parallel partition (FIPIP) and implement it on Graphics Processing Unit (GPU) based heterogeneous many-core systems. The approach is characterized by the following aspects. First, our approach takes individual pixels as parallel processing units, instead of blocks. Imposing pixel-level parallelism is capable of fully exploiting the computational power of heterogeneous GPU-based systems and hence tremendously reduces the encoding time. Second, we unify irregular prediction formulas in intra-frame prediction into a well-designed uniform one, and propose a table-lookup method to efficiently perform intra-frame prediction. Our formula can eliminate unnecessary branch instructions by using a unified predictor array, which improves the efficiency of the fine-grained parallel partition significantly. Third, two optimized encoding orders assisted by an improved combined frame strategy are adopted to implement multi-level parallelism. Finally, an efficient self-synchronizing method is realized for fine-grained task scheduling on heterogeneous CPU-GPU architecture. We apply FIPIP to encode a set of benchmark videos under varying conditions and compare it with other popular intra-frame prediction methods. Results show that FIPIP outperforms existing state-of-the-art work with speedups factor of 2-6.

Keywords

H.264/AVC Fast mode decision Parallelism GPU