Modulation domain blind speech separation in noisy environments

Article ID	Journal	Published Year	Pages	File Type
567095	Speech Communication	2013	19 Pages	PDF

Abstract

•We propose a method for modulation domain noise robust blind speech separation.•We use modulation domain complex spectral subtraction to preprocess speech mixtures.•We use asymmetric Laplacian mixture modeling to estimate source DOAs.•We use modulation domain separation to increase the source sparsity.•Our method showed superior performances in PESQ, segmental SDR, and SIR gain.

We propose a noise robust blind speech separation (BSS) method by using two microphones. We perform BSS in the modulation domain to take advantage of the improved signal sparsity and reduced musical tone noise in this domain over the conventional acoustic frequency domain processing. We first use modulation domain real and imaginary spectral subtraction (MRISS) to enhance both magnitude and phase spectra of the noisy speech mixture inputs. We then estimate the direction of arrivals (DOAs) of the speech sources from subband inter-sensor phase differences (IPDs) by using an asymmetric Laplacian mixture model (ALMM), cluster the full-band IPDs via the estimated DOAs, and perform time–frequency masking to separate the source signals, all in the modulation domain. Experimental evaluations in five types of noises have shown that the performance of the proposed method is robust in 0–10 dB SNRs and it is superior to acoustic domain separation without MRISS.

Keywords

Time?frequency masking Direction of arrival Modulation frequency