SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567033	1452042	2014	18 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Domain adaptation - انطباق دامنه statistical machine translation - ترجمه ماشین آماری Automatic speech recognition - تشخیص گفتار خودکار Under-resourced languages - زبان های زیرزمینی Language modeling - مدل سازی زبان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

چکیده انگلیسی

• The first large-vocabulary Romanian ASR system is presented.
• Phonetization and diacritics restoration systems for Romanian are introduced.
• An innovative ASR domain-adaptation methodology based on SMT is proposed.
• The semi-supervised adaptation methods are shown to improve ASR performance.

This study investigates the possibility of using statistical machine translation to create domain-specific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition (ASR) system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. Several translation scenarios (both unsupervised and semi-supervised) are used to obtain domain-specific textual data. Moreover this paper shows that a small amount of manually post-edited text is enough to develop other natural language processing systems that, in turn, can be used to automatically improve the machine translated text, leading to a significant boost in ASR performance. An in-depth analysis, to explain why and how the machine translated text improves the performance of the domain-specific ASR, is also made at the end of this paper. As bi-products of this core domain-adaptation methodology, this paper also presents the first large vocabulary continuous speech recognition system for Romanian, and introduces a diacritics restoration module to process the Romanian text corpora, as well as an automatic phonetization module needed to extend the Romanian pronunciation dictionary.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 56, January 2014, Pages 195–212

نویسندگان

Horia Cucu, Andi Buzo, Laurent Besacier, Corneliu Burileanu,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

دسترسی سریع

ارتباط

English Website