ESL essay raters’ cognitive processes in applying the Jacobs et al. rubric: An eye-movement study

Article ID	Journal	Published Year	Pages	File Type
344208	Assessing Writing	2015	17 Pages	PDF

Abstract

•We used eye tracking to measure raters’ attention to analytic rubric subcomponents.•Attention was associated with the essay raters’ inter-rater reliability estimates.•Raters who agreed the most had common attentional foci across the subcomponents.•Disagreeing raters read different parts of the rubric to justify their scores.•We discuss rubric layout as an important factor in test-construct articulation.

We investigated how nine trained raters used a popular five-component analytic rubric by Jacobs et al. (1981; reproduced in Weigle, 2002). We recorded the raters’ eye movements while they rated 40 English essays because cognition drives eye movement (Reichle, Warren, & McConnell, 2009): By inspecting to what raters attend (on a rubric), we gain insights into their thoughts. We estimated inter-rater-reliability for each subcomponent. Attention (measured as total eye-fixation duration and eye-visit count, with the number of words per subcomponent controlled) was associated with inter-rater reliability: Organization (the second category) received the most attention (slightly more than the first, content). Organization also had the highest inter-rater reliability (ICC coefficient = .92). Raters attended least to and agreed least on mechanics (the last category; ICC coefficient = .85). Raters who agreed the most had common attentional foci across the subcomponents. Disagreements were directly viewable through eye-movement-data heatmaps. We discuss the rubric in terms of primacy: raters paid the most attention to organization and content because they were on the left (and read first). We hypothesize what would happen if test developers were to remove the least-reliable (and right-most) subcomponent (mechanics). We discuss rubric design as an important factor in test-construct articulation.

Keywords

Rater effects Inter-rater reliability Cognitive processing