Article ID Journal Published Year Pages File Type
6864599 Neurocomputing 2018 10 Pages PDF
Abstract
With rapid development of deep convolutional neural networks (CNNs), more and more high-accuracy CNN architectures have been proposed for multimedia and vision applications. Among them, Bilinear CNN (B-CNN) performs outer product on the outputs of convolutional layers of two stream CNN models, and has attracted a lot of attentions due to its effectiveness in fine-grained categorization. However, the B-CNN fails to use the information inherent in different convolutional layers, meanwhile the cost of computing and storage is higher as evaluation requirements of two CNN models. In this paper, we propose a family of Hyperlayer Bilinear Pooling (HLBP) methods to overcome the limitations of B-CNN. Evaluating a single CNN model only once, our HLBP methods can effectively exploit the second-order statistics of features of multiple layers. Besides fine-grained categorization, we also apply the proposed HLBP methods to content-based image retrieval task. Extensive experiments on seven widely used benchmarks demonstrate that our HLBP methods are superior to the counterparts (e.g., B-CNN), achieving very competitive or better performance compared to state-of-the-arts on both fine-grained categorization and content-based image retrieval tasks.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,