Article ID Journal Published Year Pages File Type
1083699 Journal of Clinical Epidemiology 2008 6 Pages PDF
Abstract

ObjectiveAny attempt to generalize the performance of a subjective diagnostic method should take into account the sample variation in both cases and readers. Most current measures of the performance of a test, especially the indices of reliability, only tackle the variation of cases, and hence are not suitable for generalizing results across the population of readers. We attempted to study the effect of readers' variation on two measures of multireader reliability: pair-wise agreement and Fleiss' kappa.Study Design and SettingWe used a normal hierarchical model with a latent trait (signal) variable to simulate a binary decision-making task by different number of readers on an infinite sample of cases.ResultsIt could be shown that both measures, especially Fleiss' kappa, have a large sample variance when estimated by a small number of readers, casting doubt on their accuracy given the number of readers typically used in current reliability studies.ConclusionThe majority of the current agreement studies is likely limited by the number of readers and is unlikely to produce a reliable estimate of reader agreement.

Related Topics
Health Sciences Medicine and Dentistry Public Health and Health Policy
Authors
, , , ,