کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565888 1452027 2015 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Integrating meta-information into recurrent neural network language models
ترجمه فارسی عنوان
یکپارچه سازی متا اطلاعات در مدل های مداوم شبکه های عصبی شبکه
کلمات کلیدی
شبکه عصبی مکرر، مدل های زبان، بخشی از سخنرانی، محیط اجتماعی-موقعیتی، موضوع، طول رأی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Recurrent neural network language models benefit from the integration of meta-information.
• Meta-information of various types, word-level, discourse-level, and intrinsic, is valuable.
• If meta-information can be robustly predicted, it has potential to improve performance.
• Intrinsic meta-information, word- and sentence-length, is trivial to obtain, but still useful.
• Approach is validated with the Spoken Dutch Corpus and the Wall Street Journal data set.

Due to their advantages over conventional n-gram language models, recurrent neural network language models (rnnlms) recently have attracted a fair amount of research attention in the speech recognition community. In this paper, we explore one advantage of rnnlms, namely, the ease with which they allow the integration of additional knowledge sources. We concentrate on features that provide complementary information w.r.t. the lexical identities of the words. We refer to such information as meta-information. We single out three cases and investigate their merits by means of N-best list re-scoring experiments on a challenging corpus of spoken Dutch (referred to as cgn) as well as on the English Wall Street Journal (wsj) corpus. First, we look at Parts of Speech (POS) tags and lemmas, two sources of word-level linguistic information that are known to make a contribution to the performance of conventional language models. We confirm that rnnlms can benefit from these sources as well. Second, we investigate socio-situational settings (ssss) and topics, two sources of discourse-level information that are also known to benefit language models. ssss are present in the cgn data, and can be seen as a proxy for the language register. For the purposes of our investigation, we assume that information on the sss can be captured at the moment at which speech is recorded. Topics, i.e., treatments of different subjects, are present in the wsj data. In order to predict POS, lemmas, sss and topic, a second rnnlm is coupled to the main rnnlm. We refer to this architecture as a recurrent neural network tandem language model (rnntlm). Our experimental findings show that if high-quality meta-information labels are available, both word-level and discourse-level information improve performance of language models. Third, we investigate sentence length and word length (i.e., token size), two sources of intrinsic information that are readily available for exploitation because they are known at the time of re-scoring. Intrinsic information has been largely overlooked by language modeling research. The results of both experiments on cgn data and wsj data show that integrating sentence length and word length can achieve improvement. rnnlms allow these features to be incorporated with ease, and obtain improved performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 73, October 2015, Pages 64–80
نویسندگان
, , , , , , ,