کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6960708 1452003 2018 30 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A new time-frequency binary mask estimation method based on convex optimization of speech power
ترجمه فارسی عنوان
روش برآورد ماسک باینری زمان جدید بر اساس بهینه سازی محدب قدرت گفتار است
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
In conventional computational auditory scene analysis, the segment segregation and pitch estimation are necessary. However, it is hard to obtain an accurate pitch contour of clean speech for segregating the segments in low signal-to-noise ratio. This often leads to an inaccurate estimation of binary mask and produces artifacts and temporal discontinuity in the enhanced speech. To overcome this problem, we propose in this paper a new estimation method for binary mask based on convex optimization of speech power. In this proposed method, the segment segregation and pitch estimation are excluded and only the speech power of each Gammatone channel is considered as a key cue used for labeling the binary masks. Considering the cross-correlation between the power spectra of noisy speech and noise in each channel, the objective function of speech power is built, and the speech power is solved by the gradient descent method. Accordingly, the time-frequency units of speech and noise are labeled by computing a decision factor derived from the powers of noisy speech, estimated speech and the pre-estimated noise. The erroneous local masks are refined by time-frequency unit smoothing. The objective measurements including segmental signal-to-noise ratio, HIT-False Alarm rate, the percentage of energy loss and the percentage of noise residue and the additional subjective listening test demonstrate the effectiveness of the proposed method.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 97, March 2018, Pages 51-65
نویسندگان
, ,