Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval

Article ID	Journal	Published Year	Pages	File Type
6864599	Neurocomputing	2018	10 Pages	PDF

Abstract

With rapid development of deep convolutional neural networks (CNNs), more and more high-accuracy CNN architectures have been proposed for multimedia and vision applications. Among them, Bilinear CNN (B-CNN) performs outer product on the outputs of convolutional layers of two stream CNN models, and has attracted a lot of attentions due to its effectiveness in fine-grained categorization. However, the B-CNN fails to use the information inherent in different convolutional layers, meanwhile the cost of computing and storage is higher as evaluation requirements of two CNN models. In this paper, we propose a family of Hyperlayer Bilinear Pooling (HLBP) methods to overcome the limitations of B-CNN. Evaluating a single CNN model only once, our HLBP methods can effectively exploit the second-order statistics of features of multiple layers. Besides fine-grained categorization, we also apply the proposed HLBP methods to content-based image retrieval task. Extensive experiments on seven widely used benchmarks demonstrate that our HLBP methods are superior to the counterparts (e.g., B-CNN), achieving very competitive or better performance compared to state-of-the-arts on both fine-grained categorization and content-based image retrieval tasks.

Keywords

Image retrieval Convolutional neural networks