Article ID Journal Published Year Pages File Type
564578 Digital Signal Processing 2015 10 Pages PDF
Abstract

•We proposed a two-microphone model-based algorithm for separation of moving sound sources.•We utilize a spatial-model of sources, and separate source signals accordingly.•We employ an expectation-maximization algorithm to initialize the model parameters.•We derive a maximum-likelihood-linear-regression algorithm to adapt the model parameters according to new source locations.

This paper describes a system for separating multiple moving sound sources from two-channel recordings based on spatial cues and a model adaptation technique. We employ a statistical model of observed interaural level and phase differences, where maximum likelihood estimation of model parameters is achieved through an expectation-maximization algorithm. This model is used to partition spectrogram points into several clusters (one cluster per source) and generate spectrogram masks accordingly for isolating individual sound sources. We follow a maximum likelihood linear regression (MLLR) approach for tracking source relocations and adapting model parameters accordingly. The proposed algorithm is able to separate more sources than input channels, i.e. in the underdetermined setting. In simulated anechoic and reverberant environments with two and three speakers, the proposed model-adaptation algorithm yields more than 10 dB gain in signal-to-noise-ratio-improvement for azimuthal source relocations of 15° or more. Moreover, this performance gain is achievable with only 0.6 seconds of input mixture received after relocation.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, , , ,