q-Gaussian mixture models for image and video semantic indexing

Article ID	Journal	Published Year	Pages	File Type
528751	Journal of Visual Communication and Image Representation	2013	8 Pages	PDF

Abstract

•We propose a q-Gaussian mixture model (q-GMM) for image and video semantic indexing.•The q-GMM has a parameter q that controls its tail-heaviness.•The q-GMM is more suitable than a GMM for representing images and videos.•Our proposed method outperformed bag-of-visual-words on PASCAL VOC and TRECVID datasets.

Gaussian mixture models which extend Bag-of-Visual-Words (BoW) to a probabilistic framework have been proved to be effective for image and video semantic indexing. Recently, the q-Gaussian distribution, derived from Tsallis statistics [11], has been shown to be useful for representing patterns in many complex systems in physics. We propose q-Gaussian mixture models (q-GMMs), mixture models of q-Gaussian distributions with a parameter q to control its tail-heaviness, for image and video semantic indexing [1]. The long-tailed distributions obtained for q>1q>1 are expected to effectively represent complexly correlated data, and hence, to improve robustness against outliers. The main improvements over our previous study [1] are q-GMM super-vector representation to efficiently compute the q-GMM kernel, and detailed experimental analysis showing accuracy and testing-cost comparison with recent kernel methods. Our proposed method outperformed BoW and achieved 49.42% and 10.90% in Mean Average Precision on the PASCAL VOC 2010 and the TRECVID 2010 Semantic Indexing, respectively.

Keywords

Tsallis statistics Image classification Gaussian mixture models Semantic indexing Codebook Bag-of-Words