Discovering hierarchical object models from captioned images

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
527851	869388	2012	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Object recognition - تشخیص شی Automatic image annotation - حاشیه نویسی تصویر خودکار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Discovering hierarchical object models from captioned images

چکیده انگلیسی

We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.

► We learn to recognize exemplars from unstructured collections of captioned images.
► Using language, we perceptually group local features into meaningful parts.
► We further group discovered parts into flexible hierarchical configurations.
► Learned visual structures are scale, translation and rotation invariant.
► Learning is robust to distractors, clutter, ambiguous and incomplete captions.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Vision and Image Understanding - Volume 116, Issue 7, July 2012, Pages 842–853

نویسندگان

Michael Jamieson, Yulia Eskin, Afsaneh Fazly, Suzanne Stevenson, Sven J. Dickinson,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Discovering hierarchical object models from captioned images

دسترسی سریع

ارتباط

English Website