Reverberant speech separation with probabilistic time–frequency masking for B-format recordings

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
566771	1452032	2015	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Direction of arrival (DOA)Blind Source Separation (BSS) - جداسازی منبع کور (BSS)Acoustic intensity - شدت آکوستیک

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Reverberant speech separation with probabilistic time–frequency masking for B-format recordings

چکیده انگلیسی

• We have presented a new algorithm for the separation of convolutive mixtures by incorporating the intensity vector of the acoustic field with probabilistic time–frequency masking.
• The DOA and mixing vector cues are modeled by the von Mises mixture model and complex Gaussian mixture model respectively, the parameters of which are updated iteratively via the EM algorithm.
• A reliability-based method is introduced to improve the performance of source separation by mitigating the effect of room reverberation.

Existing speech source separation approaches overwhelmingly rely on acoustic pressure information acquired by using a microphone array. Little attention has been devoted to the usage of B-format microphones, by which both acoustic pressure and pressure gradient can be obtained, and therefore the direction of arrival (DOA) cues can be estimated from the received signal. In this paper, such DOA cues, together with the frequency bin-wise mixing vector (MV) cues, are used to evaluate the contribution of a specific source at each time–frequency (T–F) point of the mixtures in order to separate the source from the mixture. Based on the von Mises mixture model and the complex Gaussian mixture model respectively, a source separation algorithm is developed, where the model parameters are estimated via an expectation–maximization (EM) algorithm. A T–F mask is then derived from the model parameters for recovering the sources. Moreover, we further improve the separation performance by choosing only the reliable DOA estimates at the T–F units based on thresholding. The performance of the proposed method is evaluated in both simulated room environments and a real reverberant studio in terms of signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ). The experimental results show its advantage over four baseline algorithms including three T–F mask based approaches and one convolutive independent component analysis (ICA) based method.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 68, April 2015, Pages 41–54

نویسندگان

Xiaoyi Chen, Wenwu Wang, Yingmin Wang, Xionghu Zhong, Atiyeh Alinaghi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Reverberant speech separation with probabilistic time–frequency masking for B-format recordings

دسترسی سریع

ارتباط

English Website