Article ID Journal Published Year Pages File Type
383015 Expert Systems with Applications 2013 17 Pages PDF
Abstract

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to significantly improve the performance of the speech recognition, which leads to an improvement in both the language understanding and the dialogue management tasks.

► We present an approach to adapt dynamically the language models used by a speech recognizer using dialogue-based information. ► On each dialogue turn, the system interpolates a static LM with several content-dependent models related to semantic and intention information. ► The system obtains the interpolation weights using the posterior probabilities of concepts and goals estimated by the dialogue manager. ► We evaluate two strategies to obtain the models (one LM for each element, and several clustering approaches to group dialogue elements). ► The evaluation shows a significant reduction of the error rates when adapting the LMs in a dialogue system used to control a Hi-Fi audio system.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,