کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558427 874926 2007 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition
چکیده انگلیسی

In this paper, we introduce the backoff hierarchical class n-gram language models to better estimate the likelihood of unseen n-gram events. This multi-level class hierarchy language modeling approach generalizes the well-known backoff n-gram language modeling technique. It uses a class hierarchy to define word contexts. Each node in the hierarchy is a class that contains all the words of its descendant nodes. The closer a node to the root, the more general the class (and context) is. We investigate the effectiveness of the approach to model unseen events in speech recognition. Our results illustrate that the proposed technique outperforms backoff n-gram language models. We also study the effect of the vocabulary size and the depth of the class hierarchy on the performance of the approach. Results are presented on Wall Street Journal (WSJ) corpus using two vocabulary set: 5000 words and 20,000 words. Experiments with 5000 word vocabulary, which contain a small numbers of unseen events in the test set, show up to 10% improvement of the unseen event perplexity when using the hierarchical class n-gram language models. With a vocabulary of 20,000 words, characterized by a larger number of unseen events, the perplexity of unseen events decreases by 26%, while the word error rate (WER) decreases by 12% when using the hierarchical approach. Our results suggest that the largest gains in performance are obtained when the test set contains a large number of unseen events.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 21, Issue 1, January 2007, Pages 88–104
نویسندگان
,