Leveraging multi-modal fusion for graph-based image annotation

Article ID	Journal	Published Year	Pages	File Type
11002848	Journal of Visual Communication and Image Representation	2018	13 Pages	PDF

Abstract

Considering each of the visual features as one modality in image annotation task, efficient fusion of different modalities is essential in graph-based learning. Traditional graph-based methods consider one node for each image and combine its visual features into a single descriptor before constructing the graph. In this paper, we propose an approach that constructs a subgraph for each modality in such a way that edges of subgraph are determined using a search-based approach that handles class-imbalance challenge in the annotation datasets. Multiple subgraphs are then connected to each other to have a supergraph. This follows by introducing a learning framework to infer the tags of unannotated images on the supergraph. The proposed approach takes advantages of graph-based semi-supervised learning and multi-modal representation simultaneously. We evaluate the performance of the proposed approach on different datasets. The results reveal that the proposed approach improves the accuracy of annotation systems.

Keywords

Supergraph TAG Image annotation Manifold Graph-based learning