کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566000 1452024 2016 21 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Linguistically-constrained formant-based i-vectors for automatic speaker recognition
ترجمه فارسی عنوان
I-vectors مبتنی بر زبان شناختی محدود برای تشخیص بلندگو خودکار
کلمات کلیدی
شناسایی خودکار سخنران؛فرکانس فرمند؛دینامیک فرمن؛سیستم های متناسب با زبان شناختی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We present an approach to automatic speaker verification through linguistically-constrained i-vector systems based on formant frequencies.
• An analysis of discriminative and calibration properties is presented for every linguistic unit (phones and diphones).
• An analysis of the best-performing units for different speakers reveals remarkable speaker-dependent specificities.
• Different approaches for selection and fusion of different linguistic units are also analysed.
• The fusion of a cepstral-based and formant-based systems obtain improved performance.

This paper presents a large-scale study of the discriminative abilities of formant frequencies for automatic speaker recognition. Exploiting both the static and dynamic information in formant frequencies, we present linguistically-constrained formant-based i-vector systems providing well calibrated likelihood ratios per comparison of the occurrences of the same isolated linguistic units in two given utterances. As a first result, the reported analysis on the discriminative and calibration properties of the different linguistic units provide useful insights, for instance, to forensic phonetic practitioners. Furthermore, it is shown that the set of units which are more discriminative for every speaker vary from speaker to speaker. Secondly, linguistically-constrained systems are combined at score-level through average and logistic regression speaker-independent fusion rules exploiting the different speaker-distinguishing information spread among the different linguistic units. Testing on the English-only trials of the core condition of the NIST 2006 SRE (24,000 voice comparisons of 5 minutes telephone conversations from 517 speakers -219 male and 298 female-), we report equal error rates of 9.57 and 12.89% for male and female speakers respectively, using only formant frequencies as speaker discriminative information. Additionally, when the formant-based system is fused with a cepstral i-vector system, we obtain relative improvements of ∼6% in EER (from 6.54 to 6.13%) and ∼15% in minDCF (from 0.0327 to 0.0279), compared to the cepstral system alone.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 76, February 2016, Pages 61–81
نویسندگان
, ,