Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
410604 | Neurocomputing | 2009 | 5 Pages |
Abstract
We show how to improve a state-of-the-art neural network language model that converts the previous “context” words into feature vectors and combines these feature vectors linearly to predict the feature vector of the next word. Significant improvements in predictive accuracy are achieved by using a non-linear subnetwork to modulate the effects of the context words or to produce a non-linear correction term when predicting the feature vector. A log-bilinear language model that incorporates both of these improvements achieves a 26% reduction in perplexity over the best n-gram model on a fairly large dataset.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Andriy Mnih, Zhang Yuecheng, Geoffrey Hinton,