Weighted Viterbi decoding strategies for distributed speech recognition over IP networks

Article ID	Journal	Published Year	Pages	File Type
568970	Speech Communication	2006	13 Pages	PDF

Abstract

The presence of burst-like packet losses may produce a serious degradation of the performance of Distributed Speech Recognition systems over IP networks. Several strategies that exploit high speech-signal correlation have already been proposed to overcome this problem. One of the most common mechanisms is to fill in the gap using the correctly received frames, as in nearest frame repetition (NFR), the mechanism proposed in the ETSI Aurora standard. Other strategies involve the reconstruction of the lost segment using interpolation algorithms or even more complex mechanisms. However, the effectiveness of these strategies may be seriously compromised if losses appear in long bursts. Interpolation may simply become unfeasible if the lost segment is a lengthy one. In repetition algorithms, on the other hand, the unreliability of the substituted frame produces a high insertion rate that can reduce the word error rate to the point where the mechanism becomes counterproductive. A feasible strategy for palliating this effect is to use a soft-decoding algorithm. In this kind of strategy, the reliability of the frame is taken into account at the decoding stage, aiming to mitigate the effect of the incorrect vectors in the recognition stage. In this paper we present a comprehensive study of the performance of some soft-decoding weighted Viterbi algorithms in a burst-like network environment. Tests will be conducted using the Aurora 3 framework working over three simulated network conditions, with the NFR algorithm as the baseline approach. Three techniques will be thoroughly explored. The first one uses a fixed weighting coefficient along the burst. This improves the results obtained by the basic NFR algorithm, but with the drawback, however, of showing a strong dependence on mean burst length. The application of a varying weighting coefficient is proposed next as a way to solve this problem. Two strategies—one heuristic and one data-driven—are proposed to obtain the variation in the weighting coefficient. Finally, a third method that exploits the different degree of correlation of individual components of the feature vector is presented.

Keywords

Distributed speech recognition Missing data