Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
532227 | Pattern Recognition | 2013 | 11 Pages |
In recent years, massive amounts of web image data have been emerging on the web. How to precisely label these images is critical and challenging to modern image search engines. Due to the fact that web image contents are more and more complex, existing image-level tagging methods may become less effective and hardly achieve satisfactory performance. This raises an urgent need for the fine-grained tagging, e.g., region-level tagging. In this work, we study how to establish mapping between tags and image regions. In particular, a novel hierarchical local image tagging method is proposed to simultaneously assign tags to all the regions within the same image. We propose a Laplacian Joint Group Lasso (LJGL) model to jointly reconstruct the regions within a test image with a set of labeled training data. The LJGL model not only considers the robust encoding ability of joint group lasso but also preserves local structural information embedded in test regions. Besides, we extend the LJGL model to a kernel version in order to achieve the non-linear reconstruction. An effective algorithm is devised to optimize the objective function of the proposed model. Tags of training data are propagated to the reconstructed regions according to the reconstruction coefficients. Extensive experiments on four public image datasets demonstrate that our proposed models achieve significant performance improvements over the state-of-the-art methods in local image tagging.
► We devise a novel hierarchical local image tagging method for setting up correspondence between tags and image regions. ► A graph regularized joint group sparse coding is proposed to simultaneously reconstruction all regions within the same images. ► We generalize the proposed model to a kernel version which can handle non-linear reconstruction cases. ► An efficient algorithm is developed to optimize the proposed algorithm. ► Extensive experiments on multiple real-world image datasets are conducted to illustrate the superiority of our method.