Article ID Journal Published Year Pages File Type
4977795 Speech Communication 2017 10 Pages PDF
Abstract
In this study, we propose an efficient way to combine human and automated scoring to increase the reliability and validity of a system used to assess spoken responses in the context of an international English language assessment. A set of filtering systems are used to automatically identify various classes of spoken responses that are difficult to score with an automated scoring system, for example, due to a high level of noise or imperfections in components of the overall system. Finally, these flagged responses are then routed to and scored by human raters. The vast majority of responses are not flagged by the filtering system and receive scores by the automated scoring system, resulting in a hybrid scoring approach. The overall hybrid speech scoring system presented here is comprised of multiple subprocesses, including the recording of spoken responses, transcription generation based on an automated speech recognizer, linguistic feature generation, filtering of problematic responses, automated score generation, human rater scoring, and final score combination. We evaluate this scoring approach with pilot data from a novel international English proficiency assessment. It achieves a substantial improvement in scoring performance and score validity with a limited amount of human scoring and most responses scored automatically; the correlation between the baseline system (baseline filtering with imputation) with human raters' scores is 0.72, and using an extended filtering model, the performance improves to 0.82. The improvement can be attributed in part to the extended filtering model itself that identified more classes of non-scorable responses, and in part to the combination of machine and human scores in our hybrid system.
Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,