کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
568652 1452040 2014 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Low bit rate compression methods of feature vectors for distributed speech recognition
ترجمه فارسی عنوان
روش های کم فشرده سازی بیت برای بردارهای ویژگی برای تشخیص گفتار توزیع شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We present a new family of very efficient feature vector compression schemes for ASR.
• They use vector quantization and prediction schemes jointly for feature compression.
• The improvement of using non-linear prediction with Neural Networks is evaluated.
• We propose using a multipath search coding strategy that allows global optimization.
• Validated using small and large vocabulary corpora in clean and noisy conditions.

In this paper, we present a family of compression methods based on differential vector quantization (DVQ) for encoding Mel frequency cepstral coefficients (MFCC) in distributed speech recognition (DSR) applications. The proposed techniques benefit from the existence of temporal correlation across consecutive MFCC frames as well as the presence of intra-frame redundancy. We present DVQ schemes based on linear prediction and non-linear methods with multi-layer perceptrons (MLP). In addition to this, we propose the use of a multipath search coding strategy based on the M-algorithm that obtains the sequence of centroids that minimize the quantization error globally instead of selecting the centroids that minimize the quantization error locally in a frame by frame basis. We have evaluated the performance of the proposed methods for two different tasks. On the one hand, two small-size vocabulary databases, Spechdat-Car and Aurora 2, have been considered obtaining negligible degradation in terms of Word Accuracy (around 1%) compared to the unquantized scheme for bit-rates as low as 0.5 kbps. On the other hand, for a large vocabulary task (Aurora 4), the proposed method achieves a WER comparable to the unquantized scheme only with 1.6 kbps. Moreover, we propose a combined scheme (differential/non-differential) that allows the system to present the same sensitivity to transmission errors than previous multi-frame coding proposals for DSR.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 58, March 2014, Pages 111–123
نویسندگان
, , , ,