کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
6854763 | 1437594 | 2018 | 35 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
هوش مصنوعی
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
In this work we present a neural network embedding we call Resource2Vec, which is able to represent the resources that make up some Linked Data (LD) corpora. A vector representation of these resources allows more advantageous processing (in computational terms) as is the case with known word or document embeddings. We give a quantitative analysis for their study. Furthermore, we employ them in an Automatic Speech Recognition (ASR) task to demonstrate their functionality by designing a strategy for term discovery. This strategy permits out-of-vocabulary (OOV) terms in a Large Vocabulary Continuous Speech Recognition (LVCSR) system to be discovered and then put into the final transcription. First, we detect where a potential OOV term may have been uttered in the LVCSR output speech segments. Second, we carry out a candidate OOV search in some LD corpora. This search is oriented by distance measurements between the transcription context around the potential-OOV speech segment and the resources of the LD corpora in Resource2Vec format, obtaining a set of candidates. To rank them, we mainly depend on the phone transcription of that segment. Finally, we decide whether or not to incorporate a candidate into the final transcription. The results show we are able to improve the transcription in Word Error Rate (WER) terms significantly, after our strategy is used on speech in Spanish.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 112, 1 December 2018, Pages 301-320
Journal: Expert Systems with Applications - Volume 112, 1 December 2018, Pages 301-320
نویسندگان
Alejandro Coucheiro-Limeres, Javier Ferreiros-López, Rubén San-Segundo, Ricardo Córdoba,