کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
485443 703327 2016 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Lithuanian Broadcast Speech Transcription Using Semi-supervised Acoustic Model Training
ترجمه فارسی عنوان
ترجمه ترکیبی از لیتوانیایی با استفاده از مدل آکوستیک نیمه نظارت شده آموزش؟
کلمات کلیدی
شناسایی خودکار گفتار، زبان های کم منابع آموزش نیمه نظارت شبکه های عصبی، زبان لیتوانیایی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

This paper reports on an experimental work to build a speech transcription system for Lithuanian broadcast data, relying on unsupervised and semi-supervised training methods as well as on other low-knowledge methods to compensate for missing resources. Unsupervised acoustic model training is investigated using 360 hours of untranscribed speech data. A graphemic pronunciation approach is used to simplify the pronunciation model generation and there-fore ease the language model adaptation for the system users. Discriminative training on top of semi-supervised training is also investigated, as well as various types of acoustic features and their combinations. Experimental results are provided for each of our development steps as well as contrastive results comparing various options. Using the best system configuration a word error rate of 18.3% is obtained on a set of development data from the Quaero program.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 81, 2016, Pages 107–113
نویسندگان
, , , , ,