Ada-Sal Network: emulate the Human Visual System

Article ID	Journal	Published Year	Pages	File Type
6941813	Signal Processing: Image Communication	2016	10 Pages	PDF

Abstract

Convolutional neural networks (CNNs) have become state-of-the-art for image classification. Inspired by the physiological mechanism of saliency in real human visual system (HVS), we had previously proposed the Sal-Mask Connection. As HVS tends to select specified region of the visual field depending on saliency to interpret complex scenes, we use saliency data as an element-by-element mask on feature maps learned from convolutional connections. The effectiveness of the Sal-Mask Connection had been verified in our previous work. However, as the performance of Sal-Mask Connection was influenced by the saliency data used, and current saliency algorithms are not designed to work for image classification, it is urgent and essential that we obtain a more suitable saliency. In this paper, therefore, we propose Ada-Sal Network to learn saliency adaptively while feature maps are being trained at the same time. Experiments on CIFAR-10 and STL-10 datasets are done with three various networks. In each test, we compare the performance of benchmark network, Sal-Mask Networks using two different saliencies, with that of the Ada-Sal Network. The results indicate that Ada-Sal Network outperforms not only traditional networks but also Sal-Mask Network using non-adaptive saliency. Visualization of the networks exhibits the saliency learned adaptively is able to combine merits from input saliencies and seems to work better for most cases.

Keywords

Saliency Human visual system Convolutional neural network Image classification Machine learning