Article ID Journal Published Year Pages File Type
6939634 Pattern Recognition 2018 39 Pages PDF
Abstract
One of the major challenges in person Re-Identification (ReID) is the inconsistent visual appearance of a person. Current works on visual feature and distance metric learning have achieved significant achievements, but still suffer from the limited robustness to pose variations, viewpoint changes, etc., and the high computational complexity. This makes person ReID among multiple cameras still challenging. This work is motivated to learn mid-level human attributes which are robust to visual appearance variations and could be used as efficient features for person matching. We propose a weakly supervised multi-type attribute learning framework which considers the contextual cues among attributes and progressively boosts the accuracy of attributes only using a limited number of labeled data. Specifically, this framework involves a three-stage training. A deep Convolutional Neural Network (dCNN) is first trained on an independent dataset labeled with attributes. Then it is fine-tuned on another dataset only labeled with person IDs using our defined triplet loss. Finally, the updated dCNN predicts attribute labels for the target dataset, which is combined with the independent dataset for the final round of fine-tuning. The predicted attributes, namely deep attributes exhibit promising generalization ability across different datasets. By directly using the deep attributes with simple Cosine distance, we have obtained competitive accuracy on four person ReID datasets. Experiments also show that a simple distance metric learning modular further boosts our method, making it outperform many recent works.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,