Contextual invariant-integration features for improved speaker-independent speech recognition

Article ID	Journal	Published Year	Pages	File Type
568712	Speech Communication	2011	12 Pages	PDF

Abstract

This work presents a feature-extraction method that is based on the theory of invariant integration. The invariant-integration features are derived from an extended time period, and their computation has a very low complexity. Recognition experiments show a superior performance of the presented feature type compared to cepstral coefficients using a mel filterbank (MFCCs) or a gammatone filterbank (GTCCs) in matching as well as in mismatching training-testing conditions. Even without any speaker adaptation, the presented features yield accuracies that are larger than for MFCCs combined with vocal tract length normalization (VTLN) in matching training-test conditions. Also, it is shown that the invariant-integration features (IIFs) can be successfully combined with additional speaker-adaptation methods to further increase the accuracy. In addition to standard MFCCs also contextual MFCCs are introduced. Their performance lies between the one of MFCCs and IIFs.

Research highlights► A feature-extraction method based on invariant integration is presented. ► Experiments show a superior performance compared to standard features. ► The new features benefit from the combination with speaker-adaptation methods.

Keywords

Speech recognition