Article ID Journal Published Year Pages File Type
6960686 Speech Communication 2018 20 Pages PDF
Abstract
When computational auditory scene analysis (CASA) is used for the speech enhancement, it can mask noise effectively by an accurate mask estimation approach. In this paper, we attempt to apply the ideal ratio mask (IRM) estimation based on the spectral dependency into the speech cochleagram for enhancing speech. To achieve the spectral dependency, the concept of data field (DF) is introduced to model the time-frequency (T-F) relationship of the cochleagram so that the obtained results (termed as the potentials) with the adjacent spectral information are used eventually to estimate the IRM. In the estimation framework, we firstly use a pre-processed module to obtain initial T-F values of noise and speech. Then, given initial estimations of noise and speech, we can employ DF model to obtain the forms of speech and noise potentials, which are viewed as the energy with the information of its neighbors. Subsequently, based on the forms of speech and noise potentials, their optimal potentials that reflect their respective optimal distribution are obtained by the optimal influence factors. Finally, we attempt to obtain the masking value using the potentials of speech and noise for restoring clean target speech signal. Our algorithm is evaluated and compared with the reference methods, and it can yield an effective improvement in speech quality.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , ,