کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558439 874929 2012 26 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The latent words language model
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
The latent words language model
چکیده انگلیسی

We present a new generative model of natural language, the latent words language model. This model uses a latent variable for every word in a text that represents synonyms or related words in the given context. We develop novel methods to train this model and to find the expected value of these latent variables for a given unseen text. The learned word similarities help to reduce the sparseness problems of traditional n-gram language models. We show that the model significantly outperforms interpolated Kneser–Ney smoothing and class-based language models on three different corpora. Furthermore the latent variables are useful features for information extraction. We show that both for semantic role labeling and word sense disambiguation, the performance of a supervised classifier increases when incorporating these variables as extra features. This improvement is especially large when using only a small annotated corpus for training.


► We propose a novel generative model for learning synonyms and semantically related words from texts.
► The model improves words sense disambiguation.
► The model reduces the need for supervision in information extraction tasks.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 26, Issue 5, October 2012, Pages 384–409
نویسندگان
, , ,