کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
567491 876085 2012 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
پیش نمایش صفحه اول مقاله
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech
چکیده انگلیسی

Named Entity (NE) detection from Conversational Telephone Speech (CTS) is important from business aspects. However, results of Automatic Speech Recognition (ASR) inevitably contain errors and this makes NE detection from CTS more difficult than from written text. One of the options to detect NEs is to use a statistical NE model. In order to capture the nature of ASR errors, the NE model is usually trained with the ASR one-best results instead of manually transcribed text and then is applied to the ASR one-best results of speech that contain NEs. To make NE detection more robust to ASR errors, we propose using Word Confusion Networks (WCNs), sequences of bundled words, for both NE modeling and detection by regarding the word bundles as units instead of the independent words. We realize this by clustering similar word bundles that may originate from the same word. We trained the NE models that predict the NE tag sequences from the sequence of the word bundles with the maximum entropy principle. Note that clustering of word bundles is conducted in advance of NE modeling and thus our proposed method can combine with any NE modeling method. We conducted experiments using real-life call-center data. The experimental results showed that by using the WCNs, the accuracy of NE detection improved regardless of the NE modeling method.


► We used Word Confusion Networks (WCNs) both for Named Entity (NE) modeling and detection.
► Word bundles in WCNs are regarded as word vectors and are clustered.
► The same word can be represented as similar word vectors in WCNs.
► NE detection accuracy improved in the experiments with real-life call-center data.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Speech Communication - Volume 54, Issue 3, March 2012, Pages 491–502
نویسندگان
, , , , ,