Article ID Journal Published Year Pages File Type
8965165 Neurocomputing 2018 34 Pages PDF
Abstract
Deep convolutional neural networks (CNNs) have recently made revolutionary improvements in salient object detection. However, most existing CNN-based models fail to precisely separate the whole salient object(s) from a cluttered background due to the downsampling effects or the patch-level operation. In this paper, we propose a multi-scale deep encoder-decoder network which learns discriminative saliency cues and computes confidence scores in an end-to-end fashion. The encoder network extracts meaningful and informative features in a global view, and the decoder network recovers lost detailed object structure in a local perspective. By taking multiple resized images as the inputs, the proposed model incorporates multi-scale features from a shared network and predicts a fine-grained saliency map at the pixel level. To easily and efficiently train the whole network, the light-weighted decoder breaks through the limit of conventional symmetric structure. In addition, a two-stage training strategy is designed to encourage the robustness and accuracy of the network. Without any post-processing steps, our method is capable of significantly reducing the computation complexity while densely segmenting foreground objects from an image. Extensive experiments on six challenging datasets demonstrate that the proposed model outperforms other state-of-the-art approaches in terms of various evaluation metrics.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,