Article ID Journal Published Year Pages File Type
529149 Journal of Visual Communication and Image Representation 2015 17 Pages PDF
Abstract

•We propose a novel tensor BOW model which can represent spatial structure information of multimedia.•We propose a new tensor-based framework which can effectively reveal the discriminative knowledge along each order of tensor.•The rank of tensor representation can be selected automatically.•Two types of vector-based algorithms are extended to their tensor counterparts.•We compare the proposed algorithms with state-of-the-art methods on three multimedia applications.

Tensors representations are widely used in multimedia applications. As a key step of tensor processing, the rank-1 tensor decomposition (i.e., the CANDECOMP/PARAFAC (CP) decomposition) always requires the estimation of the tensor rank. The ℓ2,1ℓ2,1-norm has been shown to be effective for tensor rank selection. The existing tensor rank selection algorithm force the same columns of the tensor matrices to simultaneously become zero. However, the real sparse columns for different factor matrices may be different. Such strategy does not really uncover the sparse information of each factor matrix. In this paper, we add a separable ℓ2,1ℓ2,1-norm on multiple factor matrices to obtain real sparse results along to different modes. And then different sparse results are assembled into a joint sparse pattern for tensor rank selection. This added separable regularization term has twofold role in enhancing the effect of regularization for each factor matrix and fully utilizing the knowledge of multiple factor matrices to facilitate decision making. In order to effectively exploit the structure information of multimedia data, we propose a model of tensor bag of words (tBOW) as the direct input of our algorithms. In the experiments, we apply the proposed algorithms to three representative tasks of multimedia analysis, i.e., image classification, video action recognition, and head pose estimation. Experimental results on three open benchmark datasets show that our algorithms are effective to multimedia analysis.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,