Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
567092	876044	2013	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Automatic transcription Automatic speech recognition - تشخیص گفتار خودکار Speaker diarization - دفع اکسیژن Speaker adaptation - سازگاری بلندگو

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

چکیده انگلیسی

• We propose a two-pass speaker-adaptive scheme operating at a low real-time factor.
• The scheme is based on integration of speaker diarization and adaptation methods.
• The transcripts from the first pass serve for adaptation as well as for diarization.
• The first pass is carried out using the lexicon with limited number of items.
• The adaptation approach is developed using recordings of various types of programs.

This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech radio. This archive represents a significant part of Czech cultural heritage, as it contains recordings covering 90 years of broadcasting. A large portion of these documents (100,000 h) is to be transcribed and made public for browsing. To improve the transcription results, an efficient speaker-adaptive scheme is proposed. The scheme is based on integration of speaker diarization and adaptation methods and is designed to achieve a low Real-Time Factor (RTF) of the entire adaptation process, because the archive’s size is enormous. It thus employs just two decoding passes, where the first one is carried out using the lexicon with a reduced number of items. Moreover, the transcripts from the first pass serve not only for adaptation, but also as the input to the speaker diarization module, which employs two-stage clustering. The output of diarization is then utilized for a cluster-based unsupervised Speaker Adaptation (SA) approach that also utilizes information based on the gender of each individual speaker. Presented experimental results on various types of programs show that our adaptation scheme yields a significant Word Error Rate (WER) reduction from 22.24% to 18.85% over the Speaker Independent (SI) system while operating at a reasonable RTF.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 55, Issue 10, November–December 2013, Pages 1033–1046

نویسندگان

Petr Cerva, Jan Silovsky, Jindrich Zdansky, Jan Nouza, Ladislav Seps,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

دسترسی سریع

ارتباط

English Website