Localizing speakers in multiple rooms by using Deep Neural Networks

Article ID	Journal	Published Year	Pages	File Type
6951497	Computer Speech & Language	2018	50 Pages	PDF

Abstract

As term of comparison, two algorithms proposed in literature for the addressed applicative context are evaluated, the Crosspower Spectrum Phase Speaker Localization (CSP-SLOC) and the Steered Response Power using the Phase Transform speaker localization (SRP-SLOC). Besides providing an extensive analysis of the proposed method, the article shows how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error, expressed in terms of Root Mean Square Error (RMSE), equal to 324Â mm and 367Â mm, respectively, for the Simulated and the Real subsets.

Keywords

Convolutional neural networks Deep neural networks Speaker localization Acoustic source localization