General hybrid framework for uncertainty-decoding-based automatic speech recognition systems

Article ID	Journal	Published Year	Pages	File Type
566681	Speech Communication	2016	13 Pages	PDF

Abstract

•Utilizing feature uncertainties during decoding can improve the ASR noise robustness.•The uncertainty estimation in the recognition and pre-processing domain is compared.•The distribution of the recognition-domain residual noise is often not zero-mean.•The propagated pre-processing uncertainties may suffer from approximation errors.•Our hybrid approach mitigates these problems and achieves significant improvements.

Uncertainty decoding has recently been successful in improving automatic speech recognition performance in noisy environments by considering the pre-processed feature vectors not as deterministic but rather as random variables containing estimation errors, residual noise and also artifacts introduced by the signal pre-processors themselves. However, the achievable improvements depend strongly on how well the statistics of these random variables are estimated in the recognition domain. In this paper, we compare two approaches for estimating these statistics. The first approach directly estimates the needed statistics in the recognition domain. The second one estimates the statistics in the processing domain and then propagates them through the typically nonlinear feature extraction to obtain the corresponding statistics in the recognition domain. Based on this distinction, we propose a new hybrid approach that combines the advantages of both approaches and avoids their disadvantages. The new hybrid approach can be used with any speech pre-processor, which enables wider usage of the uncertainty decoding approach instead of the conventional maximum likelihood approach.

Keywords

Uncertainty propagation speech enhancement Uncertainty decoding