کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4968815 1449748 2017 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A support vector approach for cross-modal search of images and texts
ترجمه فارسی عنوان
یک روش بردار پشتیبانی برای جستجوی متقابل تصاویر و متون
کلمات کلیدی
جستجوی تصویر توضیحات تصویر، تجزیه و تحلیل متقابل رسانه،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


- We propose a novel and generic approach for cross-modal search based on Structural SVM.
- Our approach provides max-margin guarantees and better generalization than competing methods.
- We analyze and compare different aspects of our approach such as training and testing time, and performance across datasets.
- Extensive experiments demonstrate the efficacy of our approach.

Building bilateral semantic associations between images and texts is among the fundamental problems in computer vision. In this paper, we study two complementary cross-modal prediction tasks: (i) predicting text(s) given a query image (“Im2Text”), and (ii) predicting image(s) given a piece of text (“Text2Im”). We make no assumption on the specific form of text; i.e., it could be either a set of labels, phrases, or even captions. We pose both these tasks in a retrieval framework. For Im2Text, given a query image, our goal is to retrieve a ranked list of semantically relevant texts from an independent text-corpus (i.e., texts with no corresponding images). Similarly, for Text2Im, given a query text, we aim to retrieve a ranked list of semantically relevant images from a collection of unannotated images (i.e., images without any associated textual meta-data).We propose a novel Structural SVM based unified framework for these two tasks, and show how it can be efficiently trained and tested. Using a variety of loss functions, extensive experiments are conducted on three popular datasets (two medium-scale datasets containing few thousands of samples, and one web-scale dataset containing one million samples). Experiments demonstrate that our framework gives promising results compared to competing baseline cross-modal search techniques, thus confirming its efficacy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Vision and Image Understanding - Volume 154, January 2017, Pages 48-63
نویسندگان
, ,