Article ID Journal Published Year Pages File Type
527286 Computer Vision and Image Understanding 2016 14 Pages PDF
Abstract

•We proposed a novel framework for Relevance Feedback based on the Fisher Kernel.•The Fisher Kernel representation makes possible to capture temporal variation by using frame-based features.•We experiment on a high variety of scenarios and public datasets (genre classification - Blip10000, action recognition - UCF50 / UCF101 and daily activities recognition - ADL) and show the benefits of the proposed approach which outperforms other state of the art approaches.•We prove the generalization power of our approach, i.e., the framework is not dependent on a particular type of content descriptors (experiments were made with text, visual and audio features).

This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel (FK). Specifically, we train a Gaussian Mixture Model (GMM) on the top retrieval results (without supervision) and use this to create a FK representation, which is therefore specialized in modelling the most relevant examples. We use the FK representation to explicitly capture temporal variation in video via frame-based features taken at different time intervals. While the GMM is being trained, a user selects from the top examples those which he is looking for. This feedback is used to train a Support Vector Machine on the FK representation, which is then applied to re-rank the top retrieved results. We show that our approach outperforms other state-of-the-art relevance feedback methods. Experiments were carried out on the Blip10000, UCF50, UCF101 and ADL standard datasets using a broad range of multi-modal content descriptors (visual, audio, and text).

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , ,