Article ID Journal Published Year Pages File Type
558353 Computer Speech & Language 2013 20 Pages PDF
Abstract

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work proposes an online method for speaker detection, speaker direction tracking, and speech separation. The separation is based on multiple acoustic source tracking (MAST) using Bayesian filtering and time–frequency masking. Measurements from three room environments with varying amounts of reverberation using two different designs of microphone arrays are used to evaluate the capability of the method to separate up to four simultaneously active speakers. Separation of moving talkers is also considered. Results are compared to two reference methods: ideal binary masking (IBM) and oracle tracking (O-T). Simulations are used to evaluate the effect of number of microphones and their spacing.

► Simultaneous speakers are tracked using particle filtering from microphone array data. ► Tracks are used to obtain time–frequency weights for speech separation. ► Speech from simultaneous speakers can be extracted in various environments.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
,