Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6957453 | Signal Processing | 2018 | 19 Pages |
Abstract
Event detection, which targets the detection of complex events among numerous videos, has attracted growing interest recently. Previous approaches suffered from huge computation costs in multiple feature extraction and classification process. Lately, a discriminative CNN video representation method for event detection is proposed to obtain promising performances. However, this method samples the video frames uniformly for global video representations, without considering that some video parts might be redundant or noisy for the task. Though a multirate sampling solution is proposed later in consideration of the video content motion speed variation, it remains uncertain and unclear that which video part is more important to define the event. In this paper, we propose to mine effective parts for event detection and understanding. After video segmentation, we try to mine the definite parts (Event Patches) that mostly contribute to defining an event. We evaluate our event patches on the TRECVID MED 2011 dataset. Compared with CNN video representation method, which has been recognized as the best video representation for event detection, our method improves the Mean Average Precision (mAP) from 74.7% to 76.2%.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Wenlong Xie, Hongxun Yao, Sicheng Zhao, Xiaoshuai Sun, Tingting Han,