کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565409 875759 2009 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings
چکیده انگلیسی

This paper describes an algorithm called zero-crossing-based amplitude estimation (ZCAE) that enhances speech by reconstructing the desired signal from a mixture of two signals using continuously-variable weighting factors, based on pre-processing that is motivated by the well-known ability of the human auditory system to resolve spatially-separated signals. Although most conventional methods of signal separation have been based on interaural time differences (ITDs) derived from cross-correlation information, the ZCAE approach provides sound segregation based on estimates of ITD from comparisons of zero-crossings [Kim, Y.-I., An, S.J., Kil, R.M., Park, H.-M., 2005. Sound segregation based on binaural zero-crossings. In: Proc. European Conf. on Speech Communication and Technology (INTERSPEECH-2005), Lisbon, Portugal, pp. 2325–2328]. These ITD estimates are used to determine the relative contribution of the desired source in a mixture and subsequently to reconstruct a closer approximation to the desired signal. The estimation of relative target intensity in a given time-frequency segment is accomplished by analytically deriving a monotonic function that maps the estimated ITD in each time-frequency segment to the putative relative intensity of each source. The ZCAE method is evaluated by comparing the sample standard deviation of ITD estimates derived using cross-correlation and using zero-crossing information, by comparing the speech recognition accuracy that is obtained by applying the proposed methods to speech in the presence of interfering speech sources, and by comparing recognition accuracy obtained using a continuous weighting versus a binary weighting of the target and masker. It is found that better results are obtained when ITDs are estimated using zero-crossing information rather than cross-correlation information, and when continuous weighting functions are used in place of binary weighting of the target and masker in each time-frequency segment.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 51, Issue 1, January 2009, Pages 15–25
نویسندگان
, ,