Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
557994 | Computer Speech & Language | 2006 | 20 Pages |
Abstract
Language modeling for large-vocabulary conversational Arabic speech recognition is faced with the problem of the complex morphology of Arabic, which increases the perplexity and out-of-vocabulary rate. This problem is compounded by the enormous dialectal variability and differences between spoken and written language. In this paper, we investigate improvements in Arabic language modeling by developing various morphology-based language models. We present four different approaches to morphology-based language modeling, including a novel technique called factored language models. Experimental results are presented for both rescoring and first-pass recognition experiments.
Related Topics
Physical Sciences and Engineering
Computer Science
Signal Processing
Authors
Katrin Kirchhoff, Dimitra Vergyri, Jeff Bilmes, Kevin Duh, Andreas Stolcke,