Fusion of images and point clouds for the semantic segmentation of large-scale 3D scenes based on deep learning

Article ID	Journal	Published Year	Pages	File Type
6949052	ISPRS Journal of Photogrammetry and Remote Sensing	2018	12 Pages	PDF

Abstract

We address the issue of the semantic segmentation of large-scale 3D scenes by fusing 2D images and 3D point clouds. First, a Deeplab-Vgg16 based Large-Scale and High-Resolution model (DVLSHR) based on deep Visual Geometry Group (VGG16) is successfully created and fine-tuned by training seven deep convolutional neural networks with four benchmark datasets. On the val set in CityScapes, DVLSHR achieves a 74.98% mean Pixel Accuracy (mPA) and a 64.17% mean Intersection over Union (mIoU), and can be adapted to segment the captured images (image resolution 2832â¯ââ¯4256â¯pixels). Second, the preliminary segmentation results with 2D images are mapped to 3D point clouds according to the coordinate relationships between the images and the point clouds. Third, based on the mapping results, fine features of buildings are further extracted directly from the 3D point clouds. Our experiments show that the proposed fusion method can segment local and global features efficiently and effectively.

Keywords

3D point cloud Large-scale High-resolution