کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4977779 | 1452008 | 2017 | 44 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Text-independent speaker identification using robust statistics estimation
ترجمه فارسی عنوان
شناسایی بلندگوهای متن با استفاده از برآورد آمار قوی
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
شناسایی سخنران مستقل متن مدل مخلوط گاوسی، برآورد آمار دقیق، بهینه سازی محدب،
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
پردازش سیگنال
چکیده انگلیسی
It is well-known that the performance of Gaussian mixture model-based text-independent speaker identification systems deteriorates significantly with the presence of noise and spectral distortion in the training and testing utterances. In this paper, we propose a novel GMM-based speaker identification system based on two robust-statistics estimation methods: the minimum volume ellipsoid method, and the minimum covariance determinant method. Compared to the traditional maximum likelihood estimation method, the proposed methods are less sensitive to outliers in the feature-vector space caused by additive noise and spectral distortion. Moreover, in the testing phase, we propose a simple distance metric to be used for comparing the unknown testing utterance against the speakers' models. Furthermore, we derive a more robust version of the i-vector extractor, named robust i-vector, which utilizes our proposed robust estimation methods for estimating the parameters of the base universal background model. The proposed classification system has been applied to the NIST 2000 speaker recognition evaluation and the COSINE database. It has also been compared against state-of-the-art techniques such as the GMM/UBM method, the super-vectors method, and the i-vector methods. Experimental results show that the proposed classification system provides up to 16% relative improvement in the identification performance over the i-vector methods for short utterances in the NIST 2000 database and up to 8% when the utterances of the NIST 2000 database are contaminated by different types of artificial noise for signal-to-noise ratio ranging from 0 to 20Â dB. For the COSINE database, the robust i-vector estimation provides an absolute improvement of up to 8%. Finally, the real time factor of the proposed distance metric for testing is 55% higher than the RT of the regular ML scoring.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 92, September 2017, Pages 52-63
Journal: Speech Communication - Volume 92, September 2017, Pages 52-63
نویسندگان
Moataz El Ayadi, Abdel-Karim S.O. Hassan, Ahmed Abdel-Naby, Omar A. Elgendy,