Video fingerprinting using Latent Dirichlet Allocation and facial images

Article ID	Journal	Published Year	Pages	File Type
530693	Pattern Recognition	2012	10 Pages	PDF

Abstract

This paper investigates the possibility of extracting latent aspects of a video in order to develop a video fingerprinting framework. Semantic visual information about humans, more specifically face occurrences in video frames, along with a generative probabilistic model, namely the Latent Dirichlet Allocation (LDA), are used for this purpose. The latent variables, namely the video topics are modeled as a mixture of distributions of faces in each video. The method also involves a clustering approach based on Scale Invariant Features Transform (SIFT) for clustering the detected faces and adapts the bag-of-words concept into a bag-of-faces one, in order to ensure exchangeability between topics distributions. Experimental results, on three different data sets, provide low misclassification rates of the order of 2% and false rejection rates of 0%. These rates provide evidence that the proposed method performs very efficiently for video fingerprinting.

► Latent Dirichlet Allocation for Perceptual Hashing. ► Scale Invariant Features Transform (SIFT) Based Face Clustering. ► Faceword definition for video content.

Keywords

Latent Dirichlet allocation Video fingerprinting Perceptual hashing