Maximum-likelihood normalization of features increases the robustness of neural-based spoken human-computer interaction

Article ID	Journal	Published Year	Pages	File Type
6941078	Pattern Recognition Letters	2015	10 Pages	PDF

Abstract

Robust acoustic modeling is essential in the development of automatic speech recognition systems applied to spoken human-computer interaction. To this end, traditional hidden Markov models (HMM) may be improved by hybridizing them with artificial neural networks (ANN). Crucially, ANNs require input values that do not compromize their numerical stability. In spite of the relevance feature normalization has on the success of ANNs in real-world applications, the issue is mostly overlooked on the false premize that “any normalization technique will do”. The paper proposes a gradient-ascent, maximum-likelihood algorithm for feature normalization. Relying on mixtures of logistic densities, it ensures ANN-friendly values that are distributed over the (0, 1) interval in a uniform manner. Some nice properties of the approach are discussed. The algorithm is applied to the normalization of acoustic features for a hybrid ANN/HMM speech recognizer. Experiments on real-world continuous speech recognition tasks are presented. The hybrid system turns out to be positively affected by the proposed technique.

Keywords

Maximum likelihood estimation Automatic speech recognition Neural network Feature normalization Hidden Markov model