Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
527048 | Image and Vision Computing | 2011 | 14 Pages |
We describe a visual recognition system operating on a hand-held device, based on a video-based feature descriptor, and characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system recognizes objects in the field of view based on their ranking. Severe resource constraints have prompted a re-evaluation of existing algorithms improving their performance (accuracy and robustness) as well as computational efficiency. We motivate the design choices in the implementation with a characterization of the stability properties of local invariant detectors, and of the conditions under which a template-based descriptor is optimal. The analysis also highlights the role of time as “weak supervisor” during training, which we exploit in our implementation.
Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (184 K)Download as PowerPoint slideHighlights► We analyze and derive representations of objects from video. ► We integrate multi-scale detection and tracking. ► We derive a video-based feature descriptor. ► We describe a visual recognition system for a hand-held device.