کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
565886 1452027 2015 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition
ترجمه فارسی عنوان
به طور منظم حداقل واریانس ویژگی های رمزنگاری مبتنی بر پاسخ بدون وضوح برای تشخیص گفتاری مداوم قوی است
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We study the low-variance and robust features for speech recognition system on the AURORA-4 corpus.
• We propose to compute cepstral features from a regularized MVDR (RMVDR) spectral estimates, denoted as RMVDR-based Cepstral Coefficient (RMCC) features.
• A sigmoid-shape auditory domain weighting rule is proposed for speech spectrum enhancement and incorporated in to the RMCC framework.
• We incorporate the medium duration power bias subtraction (MDPBS) method in to the RMCC framework.
• Two robust front-ends are proposed, robust RMCC (RRMCC) and Normalized RMCC (NRMCC) for speech recognition.

In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high variance and they perform poorly under noisy and adverse conditions. To reduce this performance drop we propose to increase the robustness of speech recognition systems by extracting features that are more robust based on the regularized MVDR technique. The RMVDR spectrum estimator has low spectral variance and is robust to mismatch conditions. Based on the RMVDR spectrum estimator, robust acoustic front-ends, namely, are regularized MVDR-based cepstral coefficients (RMCC), robust RMVDR cepstral coefficients (RRMCC) and normalized RMVDR cepstral coefficients (NRMCC). In addition to the RMVDR spectrum estimator, RRMCC and NRMCC also utilize auditory domain spectrum enhancement methods, auditory spectrum enhancement (ASE) and medium duration power bias subtraction (MDPBS) techniques, respectively, to improve the robustness of the feature extraction method. Experimental speech recognition results are conducted on the AURORA-4 large vocabulary continuous speech recognition corpus and performances are compared with the Mel frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP), MVDR spectrum estimator-based MFCC, perceptual MVDR (PMVDR), cochlear filterbank cepstral coefficients (CFCC), power normalized cepstral coefficients (PNCC), ETSI advancement front-end (ETSI-AFE), and the robust feature extractor (RFE) of Alam et al. (2012). Experimental results demonstrate that the proposed robust feature extractors outperformed the other robust front-ends in terms of percentage word error rate on the AURORA-4 large vocabulary continuous speech recognition (LVCSR) task under clean and multi-condition training conditions. In clean training conditions, on average, the RRMCC and NRMCC provide significant reductions in word error rate over the rest of the front-ends. In multi-condition training, the RMCC, RRMCC, and NRMCC perform slightly better in terms of the average word error rate than the rest of the front-ends used in this work.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 73, October 2015, Pages 28–46
نویسندگان
, , ,