کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534417 870250 2014 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Lexicon expansion for latent variable grammars
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Lexicon expansion for latent variable grammars
چکیده انگلیسی


• We proposed a lexicon expansion approach to improve latent variable grammars.
• The lexicon expansion is based on transductive graph propagation technique.
• We constructed word-level k-NN similarity graph over labeled and unlabeled data.
• We used an unnormalized propagation algorithm to infer emission probabilities.
• Lexicon expansion with self-training can further improve latent variable grammars.

This study investigates the use of unlabeled data, i.e., raw texts, to strengthen latent variable probabilistic context-free grammars, in particular lexical models. A graph-based lexicon expansion approach is proposed to achieve this goal. It aims to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The proposed approach is based on a transductive graph-based label propagation technique. The approach builds k-nearest-neighbor (k-NN) similarity graphs over the words of labeled and unlabeled data, for propagating lexical emission probabilities. The intuition is that different word under similar syntactic environment should have approximate lexical emission distributions. The derived words, together with lexical emission probabilities, are incorporated into the parsing. This approach is very effective in parsing out-of-vocabulary (OOV) words. Empirical results for English, Chinese, and Portuguese revealed its effectiveness.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 42, 1 June 2014, Pages 47–55
نویسندگان
, , , , , ,