A Hilbert-fine-structure-derived physical metric for predicting the intelligibility of noise-distorted and noise-suppressed speech

Article ID	Journal	Published Year	Pages	File Type
567090	Speech Communication	2013	10 Pages	PDF

Abstract

•Existing metrics use primarily envelope information to model speech intelligibility variance.•We propose to predict speech intelligibility using Hilbert-derived TFS waveform.•The TFS-based metric predicts speech intelligibility better than most existing measures.

Despite the established importance of temporal fine-structure (TFS) on speech perception in noise, existing speech transmission metrics use primarily envelope information to model speech intelligibility variance. This study proposes a new physical metric for predicting speech intelligibility using information obtained from the Hilbert-derived TFS waveform. It is found that by making explicit use of coherence information contained in the complex spectra of the Hilbert-derived TFS waveforms of the clean and corrupted speech signals, and assessing the extent to which the coherence in the Hilbert fine structure is affected following the linear or non-linear processing (e.g., noise distortion, speech enhancement, etc.) of the stimulus, the predictive power of the intelligibility measure can be significantly improved for noise-distorted and noise-suppressed speech signals. When evaluated with speech recognition scores obtained with normal-hearing listeners, including a total of sixty-four noise-suppressed conditions with nonlinear distortions and eight noisy conditions without subsequent noise reduction, the proposed TFS-based measure was found to predict speech intelligibility better than most envelope- and coherence-based measures. High correlation was maintained for all types of maskers tested, with a maximum correlation of r = 0.95 achieved in car and street noise conditions.

Keywords

Speech transmission index Speech intelligibility