کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6960908 1452018 2016 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription
ترجمه فارسی عنوان
ترکیبی از مدل های صوتی متعدد با اقتباس بدون نظارت برای رونویسی سخنرانی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
Automatic speech recognition systems (ASR) have achieved considerable progress in real applications because of skilled design of the architecture with advanced techniques and algorithms. However, how to design a system efficiently integrating these various techniques to obtain advanced performance is still a challenging task. In this paper, we introduced an ensemble model combination and adaptation based ASR system with two characteristics: (1) large-scale combination of multiple ASR systems based on a Recognizer Output Voting Error Reduction (ROVER) system, and (2) multi-pass unsupervised speaker adaptation for deep neural network acoustic models and topic adaptation on language model. The multiple acoustic models were trained with different acoustic features and model architectures which helped to provide complementary and discriminative information in the ROVER process. With these multiple acoustic models, a better estimation of word confidence could be obtained from ROVER process which helped in selecting data for unsupervised adaptation on the previously trained acoustic models. The final recognition result was obtained using multi-pass decoding, ROVER, and adaptation processes. We tested the system on lecture speeches with topics related to Technology, Entertainment and Design (TED) that were used in the international workshop on spoken language translation (IWSLT) evaluation campaign, and obtained 6.5%, 7.0%, 10.6%, and 8.4% word error rates for test sets in 2011, 2012, 2013, and 2014, which to our knowledge are the best results for these evaluation sets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 82, September 2016, Pages 1-13
نویسندگان
, , , , , , ,