A novel mid-level distinctive feature learning for action recognition via diffusion map

Article ID	Journal	Published Year	Pages	File Type
4948149	Neurocomputing	2016	12 Pages	PDF

Abstract

Recent works have shown that mid-level feature is superior to low-level feature, which can not only improve discriminative power, but also enhance descriptive capability. In this paper, the classical STIP, spatial star-graph and temporal star-graph are first extracted to represent human action from multi-perspectives. Then a principled feature learning algorithm is proposed to embed these multi-cues into a unified space and enhance all low-level features using diffusion map. Unlike treating spatio-temporal patch as mid-level primitive, we use a graph to model different types of primitives, then apply graph partitioning to co-cluster them into visual-word clusters called mid-level distinctive feature, which can bridge the semantic gap across low-level features. Experimental results show that our approach can successfully classify human activities with higher accuracies both on single-person actions (KTH and UCF) and complex interactional activities (UT-Interaction and HMDB51).

Keywords

spatio-temporal structures Human action recognition Diffusion map Feature fusion Feature learning