Multi-stream speech recognition based on Dempster

Article ID	Journal	Published Year	Pages	File Type
568763	Speech Communication	2010	10 Pages	PDF

Abstract

This paper aims at investigating the use of Dempster–Shafer (DS) combination rule for multi-stream automatic speech recognition. The DS combination is based on a generalization of the conventional Bayesian framework. The main motivation for this work is the similarity between the DS combination and findings of Fletcher on human speech recognition. Experiments are based on the combination of several Multi Layer Perceptron (MLP) classifiers trained on different representations of the speech signal. The TANDEM framework is adopted in order to use the MLP outputs into conventional speech recognition systems. We exhaustively investigate several methods for applying the DS combination into multi-stream ASR. Experiments are run on small and large vocabulary speech recognition tasks and aim at comparing the proposed technique with other frame-based combination rules (e.g. inverse entropy). Results reveal that the proposed method outperforms conventional combination rules in both tasks. Furthermore we verify that the performance of the combined feature stream is never inferior to the performance of the best individual feature stream. We conclude the paper discussing other applications of the DS combination and possible extensions.

Keywords

Multi Layer Perceptron