|کد مقاله||کد نشریه||سال انتشار||مقاله انگلیسی||ترجمه فارسی||نسخه تمام متن|
|4973648||1451680||2018||19 صفحه PDF||سفارش دهید||دانلود کنید|
- Domain mismatch significantly affects the speaker verification performance.
- Domain invariant linear discriminant analysis (DI-LDA) for compensating domain mismatch in the LDA subspace.
- Domain invariant probabilistic linear discriminant analysis (DI-PLDA) for domain mismatch modelling n the PLDA subspace.
- DI-LDA approach followed by the DI-PLDA (DI-PLDA[DI-LDA]) to compensate domain mismatch from both LDA and PLDA subspaces.
- Limited target domain data requirement using domain mismatch compensation techniques.
The performance of state-of-the-art i-vector speaker verification systems relies on a large amount of training data for probabilistic linear discriminant analysis (PLDA) modeling. During the evaluation, it is also crucial that the target condition data is matched well with the development data used for PLDA training. However, in many practical scenarios, these systems have to be developed, and trained, using data that is often outside the domain of the intended application, since the collection of a significant amount of in-domain data is often difficult. Experimental studies have found that PLDA speaker verification performance degrades significantly due to this development/evaluation mismatch. This paper introduces a domain-invariant linear discriminant analysis (DI-LDA) technique for out-domain PLDA speaker verification that compensates domain mismatch in the LDA subspace. We also propose a domain-invariant probabilistic linear discriminant analysis (DI-PLDA) technique for domain mismatch modeling in the PLDA subspace, using only a small amount of in-domain data. In addition, we propose the sequential and score-level combination of DI-LDA, and DI-PLDA to further improve out-domain speaker verification performance. Experimental results show the proposed domain mismatch compensation techniques yield at least 27% and 14.5% improvement in equal error rate (EER) over a pooled PLDA system for telephone-telephone and interview-interview conditions, respectively. Finally, we show that the improvement over the baseline pooled system can be attained even when significantly reducing the number of in-domain speakers, down to 30 in most of the evaluation conditions.
Journal: Computer Speech & Language - Volume 47, January 2018, Pages 240-258