Large-scale video copy retrieval with temporal-concentration SIFT

Article ID	Journal	Published Year	Pages	File Type
405897	Neurocomputing	2016	9 Pages	PDF

Abstract

The scale-invariant feature transform (SIFT) feature plays a very important role in multimedia content analysis, such as near-duplicate image and video retrieval. However, the storage and query costs of SIFT become unbearable for large-scale databases. In this paper, SIFT features are robustly encoded with temporal information by tracking the SIFT to generate temporal-concentration SIFT (TCSIFT), which highly compresses the quantity of local features to reduce visual redundancy, and keeps the advantages of SIFT as much as possible at the same time. On the basis of TCSIFT, a novel framework for large-scale video copy retrieval is proposed in which the processes of retrieval and validation are implemented at the feature and frame level. Experimental results for two different datasets, i.e., CC_WEB_VIDEO and TRECVID, demonstrate that our method can yield comparable accuracy, compact storage size, and more efficient execution time, as well as adapt to various video transformations.

Keywords

SIFT Spatio-temporal features