Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
568678	876437	2013	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

NMF SAT Speaker adaptation - سازگاری بلندگو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

چکیده انگلیسی

• Fast speaker adaptation by adaptation of Gaussian mixture weights is investigated.
• The weights are expressed as a linear combination of latent speaker vectors, which are learnt by non-negative matrix factorization.
• Adaptation on 3 s of data performs as well as feature-space maximum likelihood linear regression (fMLLR) on 100 s of data.
• The combination with speaker adaptive training, fMLLR and eigenvoices yields a further reduction in word error rate.

A novel speaker adaptation algorithm based on Gaussian mixture weight adaptation is described. A small number of latent speaker vectors are estimated with non-negative matrix factorization (NMF). These latent vectors encode the distinctive systematic patterns of Gaussian usage observed when modeling the individual speakers that make up the training data. Expressing the speaker dependent Gaussian mixture weights as a linear combination of a small number of latent vectors reduces the number of parameters that must be estimated from the enrollment data. The resulting fast adaptation algorithm, using 3 s of enrollment data only, achieves similar performance as fMLLR adapting on 100+ s of data. In order to learn richer Gaussian usage patterns from the training data, the NMF-based weight adaptation is combined with vocal tract length normalization (VTLN) and speaker adaptive training (SAT), or with a simple Gaussian exponentiation scheme that lowers the dynamic range of the Gaussian likelihoods. Evaluation on the Wall Street Journal tasks shows a 5% relative word error rate (WER) reduction over the speaker independent recognition system which already incorporates VTLN. The WER can be lowered further by combining weight adaptation with Gaussian mean adaptation by means of eigenvoice speaker adaptation.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 55, Issue 9, October 2013, Pages 893–908

نویسندگان

Xueru Zhang, Kris Demuynck, Hugo Van hamme,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

دسترسی سریع

ارتباط

English Website