کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6903651 1446992 2018 39 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Feature memory-based deep recurrent neural network for language modeling
ترجمه فارسی عنوان
ویژگی های حافظه مبتنی بر عمیق شبکه عصبی مجدد برای مدل سازی زبان
کلمات کلیدی
حافظه و توجه، شبکه عصبی مکرر، یادگیری عمیق، مدل سازی زبان،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
Recently, deep recurrent neural networks (DRNNs) have been widely proposed for language modeling. DRNNs can learn higher-level features of input data by stacking multiple recurrent layers, making them achieve better performance than single-layer models. However, due to their simple linear stacking patterns, the gradient information vanishes when it is backward propagated through too many layers. As a result, DRNNs become hard to train and their performance degrades rapidly with the number of recurrent layers increasing. To address this problem, the feature memory-based deep recurrent neural network (FMDRNN) is proposed in this paper. FMDRNN presents a new stacking pattern by introducing a special feature memory module (FM), which makes the hidden units of each layer can see and reuse all the features generated by preceding stacked layers, not just the feature from previous layer as in DRNNs. FM is like a traffic hub to provide direct connections between each two layers, and the attention network in FM controls the switch of these connections. These direct connections enable FMDRNN can alleviate the vanishing of gradient in the process of backward propagation and also make the learned features do not wash away when they reach the end of the network. FMDRNN is evaluated by performing extensive experiments on the widely used English Penn Treebank dataset and five more complex non-English language corpora. The experimental results show that FMDRNN can be effectively trained even if a larger number of layers are stacked, so that it benefits from deeper networks instead of degrading performance, and consistently achieves markedly better results than other models through deeper but thinner network.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 68, July 2018, Pages 432-446
نویسندگان
, , ,