Discovering object aspects from video *

Article ID	Journal	Published Year	Pages	File Type
526743	Image and Vision Computing	2016	12 Pages	PDF

Abstract

•We define aspects as 4 factors of variation: viewpoint, pose, occlusions, and cropping.•We explore weakly supervised aspect discovery from video.•We introduce a new video dataset containing over 10,000 annotated frames.•We propose a novel framework for direct aspect discovery evaluation.•Using video consistently leads to better aspect discovery compared to still images.

We investigate the problem of automatically discovering the visual aspects of an object class. Existing methods discover aspects from still images under strong supervision, as they require time-consuming manual annotation of the objects' location (e.g. bounding boxes). Instead, we explore using video, which enables automatic localisation by motion segmentation. We introduce a new video dataset containing over 10,000 frames annotated with aspect labels for two classes: cars and tigers. We evaluate several strategies for aspect discovery using state-of-the-art descriptors (e.g. CNN), and assess the benefits of using automatic video segmentation. For this, we introduce a new protocol to evaluate aspect discovery directly, in contrast to the general trend of evaluating it indirectly (e.g. its impact on a recognition pipeline). Our results consistently show that leveraging the nature of video to discover visual aspects yields significantly more accuracy. Finally, we discuss two new applications to showcase the potential of aspect discovery: image retrieval of aspects, and learning aspect transitions from video.