Three empirical studies on the agreement of reviewers about the quality of software engineering experiments

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
550376	872595	2012	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Experimentation - آزمایشی Quality evaluation - ارزیابی کیفی empirical studies - مطالعات تجربی Software engineering - مهندسی نرم‌افزار

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر تعامل انسان و کامپیوتر

پیش نمایش صفحه اول مقاله

Three empirical studies on the agreement of reviewers about the quality of software engineering experiments

چکیده انگلیسی

ContextDuring systematic literature reviews it is necessary to assess the quality of empirical papers. Current guidelines suggest that two researchers should independently apply a quality checklist and any disagreements must be resolved. However, there is little empirical evidence concerning the effectiveness of these guidelines.AimsThis paper investigates the three techniques that can be used to improve the reliability (i.e. the consensus among reviewers) of quality assessments, specifically, the number of reviewers, the use of a set of evaluation criteria and consultation among reviewers. We undertook a series of studies to investigate these factors.MethodTwo studies involved four research papers and eight reviewers using a quality checklist with nine questions. The first study was based on individual assessments, the second study on joint assessments with a period of inter-rater discussion. A third more formal randomised block experiment involved 48 reviewers assessing two of the papers used previously in teams of one, two and three persons to assess the impact of discussion among teams of different size using the evaluations of the “teams” of one person as a control.ResultsFor the first two studies, the inter-rater reliability was poor for individual assessments, but better for joint evaluations. However, the results of the third study contradicted the results of Study 2. Inter-rater reliability was poor for all groups but worse for teams of two or three than for individuals.ConclusionsWhen performing quality assessments for systematic literature reviews, we recommend using three independent reviewers and adopting the median assessment. A quality checklist seems useful but it is difficult to ensure that the checklist is both appropriate and understood by reviewers. Furthermore, future experiments should ensure participants are given more time to understand the quality checklist and to evaluate the research papers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information and Software Technology - Volume 54, Issue 8, August 2012, Pages 804–819

نویسندگان

Barbara Ann Kitchenham, Dag I.K. Sjøberg, Tore Dybå, Dietmar Pfahl, Pearl Brereton, David Budgen, Martin Höst, Per Runeson,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Three empirical studies on the agreement of reviewers about the quality of software engineering experiments

دسترسی سریع

ارتباط

English Website