A KPCA spatio-temporal differential geometric trajectory cloud classifier for recognizing human actions in a CBVR system

Article ID	Journal	Published Year	Pages	File Type
382413	Expert Systems with Applications	2015	19 Pages	PDF

Abstract

•A CBVR based upon spatio-temporal trajectories of human motion.•Trajectories encode two motion scales: large (hyperplanes); small (local differential geometry).•New fuzzy KNN classifies trajectories with proximity, orientation and approximate membership.•Validation of video retrieval system from standard datasets and feature length films.

We describe a new algorithm for distinguishing human actions in videos, called the differential geometric trajectory cloud (DGTC) method that captures both fine and large scale structure of the covariant transformed spatio-temporal optical flow field. We show the utility of our algorithm in the context of a content based video retrieval (CBVR) system, where specific frames from a full length video (or separate video shots in a database) are identified containing a queried human action. In the DGTC method, the local geometry of the spatio-temporal covariant eigenspace curves, unique to each human action, are characterized by the Frenet–Serret basis equations, thereby specifying the local time averaged curvature and torsion, as well as providing a means for defining a mean osculating hyperplane for the entire trajectory. To classify a human action from a query, our system uses an adaptive distance metric between the covariant transformed query trajectory and each of the trajectories from all of the actions in the training set. Based upon the separation of between the query and each class, the distance uses either large or small scale information about the trajectory: for large separations, the distance is the separation between trajectory cloud centroids, while for small and intermediate separations the distance is based upon the mean hyperplane orientation obtained from the time averaged curvature and torsion of the trajectory. Our system can function in real-time and has an accuracy greater than 93% for multiple action recognition within video repositories. We also demonstrate the use of our CBVR system locating specific frame positions of trained actions in two full featured films.