کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
566682 1452021 2016 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Improving speaker verification performance against long-term speaker variability
ترجمه فارسی عنوان
بهبود عملکرد تایید سخنران در برابر تنوع سخنران بلندمدت
کلمات کلیدی
تایید بلندگو؛ تنوع سخنران بلندمدت ؛ تاکید Discriminability؛ تاب فرکانس؛ وزن خروجی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• A specially-collected speech database to reflect the long-term speaker variability.
• F-ratio to determine the importance of speaker- and session-specific information.
• Frequency warping and filter-bank outputs weighting strategies for feature extraction.

Speaker verification performance degrades when input speech is tested in different sessions over a long period of time chronologically. Common ways to alleviate the long-term impact on performance degradation are enrollment data augmentation, speaker model adaptation, and adapted verification thresholds. From a point of view in features of a pattern recognition system, robust features that are speaker-specific, and invariant with time and acoustic environments are preferred to deal with this long-term variability. In this paper, with a newly created speech database, CSLT-Chronos, specially collected to reflect the long-term speaker variability, we investigate the issues in the frequency domain by emphasizing higher discrimination for speaker-specific information and lower sensitivity to time-related, session-specific information. F-ratio is employed as a criterion to determine the figure of merit to judge the above two sets of information, and to find a compromise between them. Inspired by the feature extraction procedure of the traditional MFCC calculation, two emphasis strategies are explored when generating modified acoustic features, the pre-filtering frequency warping and the post-filtering filter-bank outputs weighting are used for speaker verification. Experiments show that the two proposed features outperformed the traditional MFCC on CSLT-Chronos. The proposed approach is also studied by using the NIST SRE 2008 database in a state-of-the-art, i-vector based architecture. Experimental results demonstrate the advantage of proposed features over MFCC in LDA and PLDA based i-vector systems.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 79, May 2016, Pages 14–29
نویسندگان
, , , , ,