Feature memory-based deep recurrent neural network for language modeling

Article ID	Journal	Published Year	Pages	File Type
6903651	Applied Soft Computing	2018	39 Pages	PDF

Abstract

Recently, deep recurrent neural networks (DRNNs) have been widely proposed for language modeling. DRNNs can learn higher-level features of input data by stacking multiple recurrent layers, making them achieve better performance than single-layer models. However, due to their simple linear stacking patterns, the gradient information vanishes when it is backward propagated through too many layers. As a result, DRNNs become hard to train and their performance degrades rapidly with the number of recurrent layers increasing. To address this problem, the feature memory-based deep recurrent neural network (FMDRNN) is proposed in this paper. FMDRNN presents a new stacking pattern by introducing a special feature memory module (FM), which makes the hidden units of each layer can see and reuse all the features generated by preceding stacked layers, not just the feature from previous layer as in DRNNs. FM is like a traffic hub to provide direct connections between each two layers, and the attention network in FM controls the switch of these connections. These direct connections enable FMDRNN can alleviate the vanishing of gradient in the process of backward propagation and also make the learned features do not wash away when they reach the end of the network. FMDRNN is evaluated by performing extensive experiments on the widely used English Penn Treebank dataset and five more complex non-English language corpora. The experimental results show that FMDRNN can be effectively trained even if a larger number of layers are stacked, so that it benefits from deeper networks instead of degrading performance, and consistently achieves markedly better results than other models through deeper but thinner network.

Keywords

Memory and attention Recurrent neural network Language modeling Deep learning