Joint acoustic and language modeling for speech recognition

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
568764	876458	2010	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

n-Gram - N-Gram Discriminative training - آموزش تبعیض آمیز Speech recognition - تشخیص گفتار Maximum Entropy - حداکثر آنتروپی conditional random field - زمینه تصادفی شرطی Hidden Markov model - مدل پنهان مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Joint acoustic and language modeling for speech recognition

چکیده انگلیسی

In a traditional model of speech recognition, acoustic and linguistic information sources are assumed independent of each other. Parameters of hidden Markov model (HMM) and n-gram are separately estimated for maximum a posteriori classification. However, the speech features and lexical words are inherently correlated in natural language. Lacking combination of these models leads to some inefficiencies. This paper reports on the joint acoustic and linguistic modeling for speech recognition by using the acoustic evidence in estimation of the linguistic model parameters, and vice versa, according to the maximum entropy (ME) principle. The discriminative ME (DME) models are exploited by using features from competing sentences. Moreover, a mutual ME (MME) model is built for sentence posterior probability, which is maximized to estimate the model parameters by characterizing the dependence between acoustic and linguistic features. The N-best Viterbi approximation is presented in implementing DME and MME models. Additionally, the new models are incorporated with the high-order feature statistics and word regularities. In the experiments, the proposed methods increase the sentence posterior probability or model separation. Recognition errors are significantly reduced in comparison with separate HMM and n-gram model estimations from 32.2% to 27.4% using the MATBN corpus and from 5.4% to 4.8% using the WSJ corpus (5K condition).

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 52, Issue 3, March 2010, Pages 223–235

نویسندگان

Jen-Tzung Chien, Chuang-Hua Chueh,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Joint acoustic and language modeling for speech recognition

دسترسی سریع

ارتباط

English Website