کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565042 875668 2006 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A unified language model for large vocabulary continuous speech recognition of Turkish
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
A unified language model for large vocabulary continuous speech recognition of Turkish
چکیده انگلیسی

We have designed a Turkish dictation system for newspaper content transcription application. Turkish is an agglutinative language with free word order. These characteristics of the language result in vocabulary explosion, large number of out-of-vocabulary (OOV) words and an increased complexity of n-gram language models in speech recognition when words are used as recognition units. In this paper, alternative language modeling units like “stems and endings”, “stems and morphemes”, and “syllables” are investigated instead of “words”. These recognition units are compared in terms of vocabulary size, coverage, bigram perplexity and speech recognition performance. A combined model is proposed which aims to produce a balance between the OOV rate and the amount of phoneme sequence constraints on recognition units. The proposed model resulted in letter error rates (LER's) of approximately 28% for a speaker independent system and 20% for a speaker dependent system. These error rates are smaller compared to the traditional word-based model for newspaper content transcription application.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Signal Processing - Volume 86, Issue 10, October 2006, Pages 2844–2862
نویسندگان
, , ,