دانلود رایگان مقاله: ترکیبی از مدل های صوتی متعدد با اقتباس بدون نظارت برای رونویسی سخنرانی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6960908	1452018	2016	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Combination of multiple acoustic models with unsupervised adaptation for lecture speech transcription

ترجمه فارسی عنوان

ترکیبی از مدل های صوتی متعدد با اقتباس بدون نظارت برای رونویسی سخنرانی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Automatic speech recognition - تشخیص گفتار خودکار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش مقاله

ترکیبی از مدل های صوتی متعدد با اقتباس بدون نظارت برای رونویسی سخنرانی

چکیده انگلیسی

Automatic speech recognition systems (ASR) have achieved considerable progress in real applications because of skilled design of the architecture with advanced techniques and algorithms. However, how to design a system efficiently integrating these various techniques to obtain advanced performance is still a challenging task. In this paper, we introduced an ensemble model combination and adaptation based ASR system with two characteristics: (1) large-scale combination of multiple ASR systems based on a Recognizer Output Voting Error Reduction (ROVER) system, and (2) multi-pass unsupervised speaker adaptation for deep neural network acoustic models and topic adaptation on language model. The multiple acoustic models were trained with different acoustic features and model architectures which helped to provide complementary and discriminative information in the ROVER process. With these multiple acoustic models, a better estimation of word confidence could be obtained from ROVER process which helped in selecting data for unsupervised adaptation on the previously trained acoustic models. The final recognition result was obtained using multi-pass decoding, ROVER, and adaptation processes. We tested the system on lecture speeches with topics related to Technology, Entertainment and Design (TED) that were used in the international workshop on spoken language translation (IWSLT) evaluation campaign, and obtained 6.5%, 7.0%, 10.6%, and 8.4% word error rates for test sets in 2011, 2012, 2013, and 2014, which to our knowledge are the best results for these evaluation sets.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 82, September 2016, Pages 1-13

نویسندگان

Peng Shen, Xugang Lu, Xinhui Hu, Naoyuki Kanda, Masahiro Saiko, Chiori Hori, Hisashi Kawai,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : ترکیبی از مدل های صوتی متعدد با اقتباس بدون نظارت برای رونویسی سخنرانی

دسترسی سریع

ارتباط

English Website