Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
558296	874892	2014	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Unsupervised training - آموزش غیرمتمرکز Speech recognition - تشخیص گفتار Self-learning - خودآموزی topic identification - شناسایی موضوع

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال

پیش نمایش صفحه اول مقاله

Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

چکیده انگلیسی

• We present our approach to unsupervised training of HMM-based speech recognizers in domains where transcriptions do not exist.
• Our approach can be viewed as an extension of existing maximum likelihood HMM training.
• We propose an iterative learning algorithm using existing HMM trainers and recognizers.
• We demonstrate the utilities of the approach on applications in topic identification and keyword discovery.

We present our approach to unsupervised training of speech recognizers. Our approach iteratively adjusts sound units that are optimized for the acoustic domain of interest. We thus enable the use of speech recognizers for applications in speech domains where transcriptions do not exist. The resulting recognizer is a state-of-the-art recognizer on the optimized units. Specifically we propose building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space. Audio is then transcribed into these self-organizing units (SOUs). We describe how SOU training can be easily implemented using existing HMM recognition tools. We tested the effectiveness of SOUs on the task of topic classification on the Switchboard and Fisher corpora. On the Switchboard corpus, the unsupervised HMM-based SOU recognizer, initialized with a segmental tokenizer, performed competitively with an HMM-based phoneme recognizer trained with 1 h of transcribed data, and outperformed the Brno University of Technology (BUT) Hungarian phoneme recognizer (Schwartz et al., 2004). We also report improvements, including the use of context dependent acoustic models and lattice-based features, that together reduce the topic verification equal error rate from 12% to 7%. In addition to discussing the effectiveness of the SOU approach, we describe how we analyzed some selected SOU n-grams and found that they were highly correlated with keywords, demonstrating the ability of the SOU technology to discover topic relevant keywords.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 1, January 2014, Pages 210–223

نویسندگان

Man-hung Siu, Herbert Gish, Arthur Chan, William Belfield, Steve Lowe,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

دسترسی سریع

ارتباط

English Website