کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432747 | 689058 | 2013 | 9 صفحه PDF | دانلود رایگان |

Face detection is a key component in applications such as security surveillance and human–computer interaction systems, and real-time recognition is essential in many scenarios. The Viola–Jones algorithm is an attractive means of meeting the real time requirement, and has been widely implemented on custom hardware, FPGAs and GPUs. We demonstrate a GPU implementation that achieves competitive performance, but with low development costs. Our solution treats the irregularity inherent to the algorithm using a novel dynamic warp scheduling approach that eliminates thread divergence. This new scheme also employs a thread pool mechanism, which significantly alleviates the cost of creating, switching, and terminating threads. Compared to static thread scheduling, our dynamic warp scheduling approach reduces the execution time by a factor of 3. To maximize detection throughput, we also run on multiple GPUs, realizing 95.6 FPS on 5 Fermi GPUs.
► We introduce a novel technique to load-balance the Viola–Jones algorithm.
► We compare different implementations of the Viola–Jones algorithm.
► We conduct experiments on both pre-Fermi and Fermi architectures.
► We run the face detector on multiple GPUs.
► We demonstrate that the new technique better utilizes the GPU resources.
Journal: Journal of Parallel and Distributed Computing - Volume 73, Issue 5, May 2013, Pages 677–685