A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes

Article ID	Journal	Published Year	Pages	File Type
4935799	Assessing Writing	2016	11 Pages	PDF

Abstract

Second language (L2) writing researchers have noted that various rater and scoring variables may affect ratings assigned by human raters (Cumming, 1990; Vaughan, 1991; Weigle, 1994, 1998, 2002; Cumming, Kantor, & Powers, 2001; Lumley, 2002; Barkaoui, 2010). Contrast effects (Daly & Dickson-Markman, 1982; Hales & Tokar, 1975; Hughes, Keeling, & Tuck, 1983), or how previous scores impact later ratings, may also color raters' judgments of writing quality. However, little is known about how raters use the same rubric for different examinee groups. The present paper concerns an integrated reading and writing test of academic English used at a U.S. university for both admissions and placement purposes. Raters are trained to interpret the analytic scoring rubric similarly no matter which test type is scored. Using Many-Facet Rasch measurement (Linacre, 1989/1994), I analyzed scores over seven semesters, examining rater behavior on two test types (admissions or placement). Results indicated that, of 25 raters, five raters showed six instances of statistically significant bias on admissions or placement tests. The findings suggest that raters may be attributing scores to a wider range of writing ability levels on admissions than on placement tests. Implications for assessment, rater perceptions, and small-scale academic testing programs are discussed.

Keywords

Many-facet Rasch measurement Rater variability Second language writing assessment