Article ID Journal Published Year Pages File Type
405835 Neurocomputing 2016 13 Pages PDF
Abstract

In order to perform multimedia event detection (MED) tasks in uncontrolled videos, a very large number of labeled videos are required for training the event classifier, which would become quite challenging especially when there are lots of events. Because an event involves usually several spatial temporal objects, one intuitive solution is to model those objects from a large number of labeled images which can be obtained very easily from standard image datasets, such as the ImageNet challenge dataset, and to model their spatial temporal relationships from a relatively small number of labeled videos which can be also obtained very easily from standard video datasets, such as the TRECVID MED 2012 dataset. In this paper, we propose accordingly a latent group logistic regression (latent GLR) mixture model for those objects and an event bank descriptor for their spatial temporal relationships. Furthermore, we develop an efficient iterative training algorithm to learn model parameters of the individual latent GLR mixture model, which combines the coordinate descent approach and the gradient descent approach to minimize the l2,1l2,1-norm or group regularized logistic loss function. We also conduct extensive experiments to evaluate the object detection performance by using the latent GLR mixture model on the ImageNet challenge dataset and the event detection performance by using the event bank descriptor on the TRECVID MED 2012 dataset. The results demonstrate the effectiveness of both proposed approaches.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,