کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6951688 | 1451701 | 2018 | 10 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition
ترجمه فارسی عنوان
بررسی نقش برآورد طیفی تطبیقی زمین و میزان نرمال بودن گفتار در تشخیص گفتار خودکار
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
پردازش سیگنال
چکیده انگلیسی
In the context of automatic speech recognition (ASR) systems, the front-end acoustic features should not be affected by signal periodicity (pitch period). Motivated by this fact, we have studied the role of pitch-synchronous spectrum estimation approach, referred to as TANDEM STRAIGHT, in this paper. TANDEM STRAIGHT results in a smoother spectrum devoid of pitch harmonics to a large extent. Consequently, the acoustic features derived using the smoothed spectra outperform the conventional Mel-frequency cepstral coefficients (MFCC). The experimental evaluations reported in this paper are performed on speech data from a wide range of speakers belonging to different age groups including children. The proposed features are found to be effective for all groups of speakers. To further improve the recognition of children's speech, the effect of vocal-tract length normalization (VTLN) is studied. The inclusion of VTLN further improves the recognition performance. We have also performed a detailed study on the effect of speaking-rate normalization (SRN) in the context of children's speech recognition. An SRN technique based on the anchoring of glottal closure instants estimated using zero-frequency filtering is explored in this regard. SRN is observed to be highly effective for child speakers belonging to different age groups. Finally, all the studied techniques are combined for effective mismatch reduction. In the case of children's speech test set, the use of proposed features results in a relative improvement of 21.6% over the MFCC features even after combining VTLN and SRN.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Digital Signal Processing - Volume 79, August 2018, Pages 142-151
Journal: Digital Signal Processing - Volume 79, August 2018, Pages 142-151
نویسندگان
S. Shahnawazuddin, Nagaraj Adiga, Hemant K. Kathania, Gaydhar Pradhan, Rohit Sinha,