Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567341	876070	2013	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Vocal effort - تلاش آواز Speaker identification - شناسایی بلندگو Whispered speech - گفتار شنیده

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

چکیده انگلیسی

Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard or made public. Due to the profound differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of automatic speaker identification systems trained with neutral speech degrades significantly. In order to better understand these differences and to further develop efficient model adaptation and feature compensation methods, this study first analyzes the speaker and phoneme dependency of these differences by a maximum likelihood transformation estimation from neutral speech towards whispered speech. Based on analysis results, this study then considers a feature transformation method in the training phase that leads to a more robust speaker model for speaker ID on whispered speech without using whispered adaptation data from test speakers. Three estimation methods that model the transformation from neutral to whispered speech are applied, including convolutional transformation (ConvTran), constrained maximum likelihood linear regression (CMLLR), and factor analysis (FA). a speech mode independent (SMI) universal background model (UBM) is trained using collected real neutral features and transformed pseudo-whisper features generated with the estimated transformation. Text-independent closed set speaker ID results using the UT-VocalEffort II corpus show performance improvement by using the proposed training framework. The best performance of 88.87% is achieved by using the ConvTran model, which represents a relative improvement of 46.26% compared to the 79.29% accuracy of the GMM-UBM baseline system. This result suggests that synthesizing pseudo-whispered speaker and background training data with the ConvTran model results in improved speaker ID robustness to whispered speech.

► Differences between whisper speech and neutral speech production are analyzed.
► A new unsupervised algorithm is formulated that transforms whisper speech features to neutral features.
► A speech mode independent UBM model is trained using transformed pseudo whisper features based on one of two methods.
► Vector Taylor series and constrained maximum likelihood linear regression with factor analysis adaptation are employed.
► Speaker ID evaluation show performance improvements for whisper test data.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 55, Issue 1, January 2013, Pages 119–134

نویسندگان

Xing Fan, John H.L. Hansen,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Acoustic analysis and feature transformation from neutral to whisper for speaker identification within whispered speech audio streams

دسترسی سریع

ارتباط

English Website