Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts

Article ID	Journal	Published Year	Pages	File Type
566716	Speech Communication	2015	25 Pages	PDF

Abstract

•We present Interaction Quality (IQ) paradigm to model quality in ongoing spoken HMI.•Model can be used to determine objective quality score at arbitrary dialog points.•Strong correlations of IQ with real User Satisfaction (US).•IQ predicts better than US and is easier to obtain.•Only a minority of users show visible anger when they are dissatisfied.

This study presents a novel expert-based approach to assess the quality of ongoing Spoken Dialog System (SDS) interactions. We call this approach “Interaction Quality” (IQ). It is an objective measure which relies on statistical classification with Support Vector Machines (SVMs). We compare objective expert IQ annotations of ongoing SDS interactions with subjective User Satisfaction (US) ratings and show that IQ and US correlate (ρ=.66ρ=.66). Expert annotations obviously mirror the subjective user impression to a great extent while they are, above all, much easier to obtain. The IQ score that quantifies the quality of the interaction is generated using the median score of exchange annotations of several experts. US is tracked in a study with 38 users interacting with an SDS. A large, comprehensive set of domain-independent, automatic interaction parameters is introduced to quantify the interaction at arbitrary dialog exchanges. Furthermore, a manually annotated negative emotion feature is added to the parameter set in order to evaluate the contribution of emotions on the classification of IQ and US. For evaluation we use the CMU Let’s Go bus information system. The model yields a correlation of ρ=.80ρ=.80 when classifying IQ scores annotated in field data from the CMU system. Furthermore, the model achieves ρ=.74ρ=.74 for predicting US on lab data, and ρ=.89ρ=.89 for IQ on lab data. The presented approach outperforms related studies in the field. Only a marginal contribution of the emotion feature to the performance can be observed, implying that US is not influenced by visible emotions. We analyze causalities and correlations between the interaction parameters and the target variables US/IQ and identify relevant predictors. With the presented paradigm, critical dialogs can be found; once deployed as an online monitoring technique, this paradigm could render SDSs more user friendly and improve user acceptance.

Keywords

PARADISE Emotional state User satisfaction Spoken dialog systems Online monitoring Interaction parameters