Training deep neural networks with discrete state transition

Article ID	Journal	Published Year	Pages	File Type
6865241	Neurocomputing	2018	21 Pages	PDF

Abstract

Deep neural networks have been achieving booming breakthroughs in various artificial intelligence tasks, however they are notorious for consuming unbearable hardware resources, training time and power. The emerging pruning/binarization methods, which aim at both decreasing overheads and retaining high performance, seem to promise applications on portable devices. However, even with these most advanced algorithms, we have to save the full-precision weights during the gradient descent process which remains size and power bottlenecks of memory access and the resulting computation. To address this challenge, we propose a unified discrete state transition (DST) framework by introducing a probabilistic projection operator that constrains the weight matrices in a discrete weight space (DWS) with configurable number of states, throughout the whole training process. The experimental results over various data sets including MNIST, CIFAR10 and SVHN show the effectiveness of this framework. The direct transition between discrete states significantly saves memory for storing weights in full precision, as well as simplifies the computation of weight updating. The proposed DST framework is hardware friendly as it can be easily implemented by a wide range of emerging portable devices, including binary, ternary and multiple-level memory devices. This work paves the way for on-chip learning on various portable devices in the near future.

Keywords

Neural network applications Deep learning