کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4977815 1452013 2017 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Influence of binary mask estimation errors on robust speaker identification
ترجمه فارسی عنوان
تاثیر خطاهای برآورد ماسک دوتایی در شناسایی قوی سخنران
کلمات کلیدی
شناسایی بلندگو، داده های گم شده، ماسک دوتایی ایده آل، تخمینی دوتایی ماسک، حاشیه محاصره، حاشیه کامل پوشش مستقیم،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی

Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating unreliable components in the feature extraction front-end, full marginalization (FM) discards unreliable feature components in the classification back-end. Finally, bounded marginalization (BM) can be used to combine the evidence from both reliable and unreliable feature components during classification. Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable and unreliable feature components in the context of automatic speaker identification (SID). A systematic evaluation under ideal and non-ideal conditions demonstrated that the robustness to errors in the binary mask varied substantially across the different missing-data strategies. Moreover, full and bounded marginalization showed complementary performances in stationary and non-stationary background noises and were subsequently combined using a simple score fusion. This approach consistently outperformed individual SID systems in all considered experimental conditions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 87, March 2017, Pages 40-48
نویسندگان
,