Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
527754 | Computer Vision and Image Understanding | 2013 | 13 Pages |
In this paper we propose a novel method for continuous visual event recognition (CVER) on a large scale video dataset using max-margin Hough transformation framework. Due to high scalability, diverse real environmental state and wide scene variability direct application of action recognition/detection methods such as spatio-temporal interest point (STIP)-local feature based technique, on the whole dataset is practically infeasible. To address this problem, we apply a motion region extraction technique which is based on motion segmentation and region clustering to identify possible candidate “event of interest” as a preprocessing step. On these candidate regions a STIP detector is applied and local motion features are computed. For activity representation we use generalized Hough transform framework where each feature point casts a weighted vote for possible activity class centre. A max-margin frame work is applied to learn the feature codebook weight. For activity detection, peaks in the Hough voting space are taken into account and initial event hypothesis is generated using the spatio-temporal information of the participating STIPs. For event recognition a verification Support Vector Machine is used. An extensive evaluation on benchmark large scale video surveillance dataset (VIRAT) and as well on a small scale benchmark dataset (MSR) shows that the proposed method is applicable on a wide range of continuous visual event recognition applications having extremely challenging conditions.
► In this paper we address activity detection in large scale video dataset. ► A novel region extraction method is applied to reduce initial action search space. ► Max-margin Hough transformation framework is used for activity detection. ► A Verification SVM is applied to obtain the final score of detected event hypothesis. ► State-of-the-art result is reported on both large and small scale benchmark datasets.