Joint speaker and environment adaptation using TensorVoice for robust speech recognition

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
568644	1452040	2014	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Environment adaptation Tensor analysis - تجزیه و تحلیل تانسور Speech recognition - تشخیص گفتار Speaker adaptation - سازگاری بلندگو Acoustic model adaptation - سازگاری مدل آکوستیک

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Joint speaker and environment adaptation using TensorVoice for robust speech recognition

چکیده انگلیسی

• We present a speaker and noise adaptation method for robust speech recognition.
• Adaptation model is obtained from the tensor analysis of acoustic models.
• Adaptation model has two weight vectors for speaker and noise.
• Presented method shows better performance than eigenvoice adaptation.

We present an adaptation of a hidden Markov model (HMM)-based automatic speech recognition system to the target speaker and noise environment. Given HMMs built from various speakers and noise conditions, we build tensorvoices that capture the interaction between the speaker and noise by using a tensor decomposition. We express the updated model for the target speaker and noise environment as a product of the tensorvoices and two weight vectors, one each for the speaker and noise. An iterative algorithm is presented to determine the weight vectors in the maximum likelihood (ML) framework. With the use of separate weight vectors, the tensorvoice approach can adapt to the target speaker and noise environment differentially, whereas the eigenvoice approach, which is based on a matrix decomposition technique, cannot differentially adapt to those two factors. In supervised adaptation tests using the AURORA4 corpus, the relative improvement of performance obtained by the tensorvoice method over the eigenvoice method is approximately 10% on average for adaptation data of 6–24 s in length, and the relative improvement of performance obtained by the tensorvoice method over the maximum likelihood linear regression (MLLR) method is approximately 5.4% on average for adaptation data of 6–18 s in length. Therefore, the tensorvoice approach is an efficient method for speaker and noise adaptation.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 58, March 2014, Pages 1–10

نویسندگان

Yongwon Jeong,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Joint speaker and environment adaptation using TensorVoice for robust speech recognition

دسترسی سریع

ارتباط

English Website