دانلود رایگان مقاله: جداسازی گفتار دور با استفاده از ماسک های زمان پیش بینی شده از ویژگی های فضایی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
566775	1452032	2015	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Distant speech separation using predicted time–frequency masks from spatial features

ترجمه فارسی عنوان

جداسازی گفتار دور با استفاده از ماسک های زمان پیش بینی شده از ویژگی های فضایی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Time?frequency masking Microphone arrays - آرایه میکروفن Speech separation - جدایی گفتاری Neural networks - شبکه های عصبی Beamforming - شکل‌دهی پرتو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

جداسازی گفتار دور با استفاده از ماسک های زمان پیش بینی شده از ویژگی های فضایی

چکیده انگلیسی

• A neural network for time–frequency mask prediction is proposed.
• The network is trained using simulated speech to produce naturally occurring masks.
• After a post-processing stage, the mask is used as a post-filter of a beamformer.
• Speech mixtures recorded with a circular array in two rooms are separated.
• The method shows best intelligibility and SNR values compared to contrast methods.

Speech separation algorithms are faced with a difficult task of producing high degree of separation without containing unwanted artifacts. The time–frequency (T–F) masking technique applies a real-valued (or binary) mask on top of the signal’s spectrum to filter out unwanted components. The practical difficulty lies in the mask estimation. Often, using efficient masks engineered for separation performance leads to presence of unwanted musical noise artifacts in the separated signal. This lowers the perceptual quality and intelligibility of the output.Microphone arrays have been long studied for processing of distant speech. This work uses a feed-forward neural network for mapping microphone array’s spatial features into a T–F mask. Wiener filter is used as a desired mask for training the neural network using speech examples in simulated setting. The T–F masks predicted by the neural network are combined to obtain an enhanced separation mask that exploits the information regarding interference between all sources. The final mask is applied to the delay-and-sum beamformer (DSB) output.The algorithm’s objective separation capability in conjunction with the separated speech intelligibility is tested with recorded speech from distant talkers in two rooms from two distances. The results show improvement in instrumental measure for intelligibility and frequency-weighted SNR over complex-valued non-negative matrix factorization (CNMF) source separation approach, spatial sound source separation, and conventional beamforming methods such as the DSB and minimum variance distortionless response (MVDR).

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 68, April 2015, Pages 97–106

نویسندگان

Pasi Pertilä, Joonas Nikunen,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : جداسازی گفتار دور با استفاده از ماسک های زمان پیش بینی شده از ویژگی های فضایی

دسترسی سریع

ارتباط

English Website