Event patches: Mining effective parts for event detection and understanding

Article ID	Journal	Published Year	Pages	File Type
6957453	Signal Processing	2018	19 Pages	PDF

Abstract

Event detection, which targets the detection of complex events among numerous videos, has attracted growing interest recently. Previous approaches suffered from huge computation costs in multiple feature extraction and classification process. Lately, a discriminative CNN video representation method for event detection is proposed to obtain promising performances. However, this method samples the video frames uniformly for global video representations, without considering that some video parts might be redundant or noisy for the task. Though a multirate sampling solution is proposed later in consideration of the video content motion speed variation, it remains uncertain and unclear that which video part is more important to define the event. In this paper, we propose to mine effective parts for event detection and understanding. After video segmentation, we try to mine the definite parts (Event Patches) that mostly contribute to defining an event. We evaluate our event patches on the TRECVID MED 2011 dataset. Compared with CNN video representation method, which has been recognized as the best video representation for event detection, our method improves the Mean Average Precision (mAP) from 74.7% to 76.2%.

Keywords

Video segmentation Sampling rate Video representation Deep features