IRM estimation based on data field of cochleagram for speech enhancement

Article ID	Journal	Published Year	Pages	File Type
6960686	Speech Communication	2018	20 Pages	PDF

Abstract

When computational auditory scene analysis (CASA) is used for the speech enhancement, it can mask noise effectively by an accurate mask estimation approach. In this paper, we attempt to apply the ideal ratio mask (IRM) estimation based on the spectral dependency into the speech cochleagram for enhancing speech. To achieve the spectral dependency, the concept of data field (DF) is introduced to model the time-frequency (T-F) relationship of the cochleagram so that the obtained results (termed as the potentials) with the adjacent spectral information are used eventually to estimate the IRM. In the estimation framework, we firstly use a pre-processed module to obtain initial T-F values of noise and speech. Then, given initial estimations of noise and speech, we can employ DF model to obtain the forms of speech and noise potentials, which are viewed as the energy with the information of its neighbors. Subsequently, based on the forms of speech and noise potentials, their optimal potentials that reflect their respective optimal distribution are obtained by the optimal influence factors. Finally, we attempt to obtain the masking value using the potentials of speech and noise for restoring clean target speech signal. Our algorithm is evaluated and compared with the reference methods, and it can yield an effective improvement in speech quality.

Keywords

speech enhancement CASA Data field