کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382920 660798 2015 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition
ترجمه فارسی عنوان
تکنیک های شناسایی موضوعی برای سازگاری مدل زبان پویا برای تشخیص گفتار خودکار اعمال می شود
کلمات کلیدی
سازگاری مدل زبان، شناسایی موضوع، تشخیص گفتار خودکار
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We present an approach for the dynamic adaptation of the LMs used by a speech recognizer based on topic identification.
• For each audio segment the system interpolates a static LM with a topic-based LM in a two stages recognition architecture.
• The interpolation weights are computed based on different sources of information (distance of LMs and topic similarity).
• We evaluate different strategies for generating topic-specific LMs and for the adaptation of the LM used in the final stage.
• Evaluation shows a significant reduction of the error rates for both tasks (topic identification and speech recognition).

In this paper we present an efficient speech recognition approach for multitopic speech by combining information retrieval techniques and topic-based language modeling. Information retrieval based techniques, such as topic identification by means of Latent Semantic Analysis, are used to identify the topic in a recognized transcription of an audio segment. According to the confidence on the topics that have been identified, we propose a dynamic language model adaptation in order to improve the recognition performance in ‘a two stages’ automatic speech recognition system. The scheme used for the adaptation of the language model is a linear interpolation between a background general LM and a topic dependent LM. We have studied different approaches to generate the topic dependent LM and also for determining the interpolation weight of this model with the background model. In one of these approaches we use the given topic labels in the training dataset to obtain the topic models. In the other approach we separate the documents in the training dataset into topic clusters by using the k-means algorithm. For strengthening the adaptation models we also use topic identification techniques to group non topic-labeled documents from the EUROPARL text database in order to increase the amount of data for training specific topic based language models. For the evaluation of the proposed system we are using the Spanish partition of the European Parliament Plenary Sessions (EPPS) Database; we selected a subset of the database with 67 labeled topics for the evaluation. For the task of topic identification our experiments show a relative reduction in topic identification error of 44.94% when compared to the baseline method, the Generalized Vector Model with a classic TF–IDF weighting scheme. For the task of dynamic adaptation of LMs applied to ASR we have achieved a relative reduction in WER of 13.52% over a single background language model.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 42, Issue 1, January 2015, Pages 101–112
نویسندگان
, , , , ,