Article ID Journal Published Year Pages File Type
408217 Neurocomputing 2016 7 Pages PDF
Abstract

Deep neural networks have recently shown impressive performance in several machine learning tasks. An important approach to training deep networks, useful especially when labeled data is scarce, relies on unsupervised pretraining of hidden layers followed by supervised finetuning. One of the most widely used approaches to unsupervised pretraining is to train each layer with the Contrastive Divergence (CD) algorithm. In this work we present a modification to CD with the goal of learning more diverse sets of features in hidden layers. In particular, we extend the CD learning rule to penalize cosines of the angles between weight vectors, which in turn encourages orthogonality between the learned features. We demonstrate experimentally that this extension to CD improves performance of pretrained deep networks on image recognition and document retrieval tasks.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,