Multi-modal deep feature learning for RGB-D object detection

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4969603	1449975	2017	38 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Convolutional neural networks - شبکه عصبی همجوشی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

Multi-modal deep feature learning for RGB-D object detection

چکیده انگلیسی

We present a novel multi-modal deep feature learning architecture for RGB-D object detection. The current paradigm for object detection typically consists of two stages: objectness estimation and region-wise object recognition. Most existing RGB-D object detection approaches treat the two stages separately by extracting RGB and depth features individually, thus ignore the correlated relationship between these two modalities. In contrast, our proposed method is designed to take full advantages of both depth and color cues by exploiting both modality-correlated and modality-specific features and jointly performing RGB-D objectness estimation and region-wise object recognition. Specifically, shared weights strategy and a parameter-free correlation layer are exploited to carry out RGB-D-correlated objectness estimation and region-wise recognition in conjunction with RGB-specific and depth-specific procedures. The parameters of these three networks are simultaneously optimized via end-to-end multi-task learning. The multi-modal RGB-D objectness estimation results and RGB-D object recognition results are both boosted by late-fusion ensemble. To validate the effectiveness of the proposed approach, we conduct extensive experiments on two challenging RGB-D benchmark datasets, NYU Depth v2 and SUN RGB-D. The experimental results show that by introducing the modality-correlated feature representation, the proposed multi-modal RGB-D object detection approach is substantially superior to the state-of-the-art competitors. Moreover, compared to the expensive deep architecture (VGG16) that the state-of-the-art methods preferred, our approach, which is built upon more lightweight deep architecture (AlexNet), performs slightly better.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 72, December 2017, Pages 300-313

نویسندگان

Xiangyang Xu, Yuncheng Li, Gangshan Wu, Jiebo Luo,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Multi-modal deep feature learning for RGB-D object detection

دسترسی سریع

ارتباط

English Website