An efficient low bit-rate compression scheme of acoustic features for distributed speech recognition

Article ID	Journal	Published Year	Pages	File Type
453608	Computers & Electrical Engineering	2016	14 Pages	PDF

Abstract

•A low bit-rate source coding scheme for distributed speech recognition (DSR) systems is proposed.•The algorithm is based on weighted least squares (W-LS) polynomial approximation.•The efficiency of the algorithm is tested with the noisy Aurora-2 database, for bit-rates ranging from 1400 bps to 1925 bps.•The obtained results generally outperform the ETSI-AFE encoder for clean training and provide similar performance, at 1925 bps, for multi-condition training.

Due to the limited network bandwidth, a noise robust low bit-rate compression scheme of Mel frequency cepstral coefficients (MFCCs) is desired for distributed speech recognition (DSR) services. In this paper, we present an efficient MFCCs compression method based on weighted least squares (W-LS) polynomial approximation through the exploitation of the high correlation across consecutive MFCC frames. Polynomial coefficients are quantized by designing a tree structured vector quantization (TSVQ) based scheme. Recognition experiments are conducted on the noisy Aurora-2 database, under both clean and multi-condition training modes. The results show that the proposed W-LS encoder slightly exceeds the ETSI advanced front-end (ETSI-AFE) baseline system for bit-rates ranging from 1400 bps to 1925 bps under clean training mode. However, a negligible degradation is observed in case of multi-condition training mode (around 0.6% and 0.2% at 1400 bps and 1925 bps, respectively). Furthermore, the obtained performance generally outperforms the ETSI-AFE source encoder at 4400 bps under clean training and provides similar performance, at 1925 bps, under multi-condition training.

Graphical AbstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Keywords

Distributed speech recognition