کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1103139 953717 2012 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Circularity effects in corpus studies – why annotations sometimes go round in circles
موضوعات مرتبط
علوم انسانی و اجتماعی علوم انسانی و هنر زبان و زبان شناسی
پیش نمایش صفحه اول مقاله
Circularity effects in corpus studies – why annotations sometimes go round in circles
چکیده انگلیسی

Linguistic corpus research mainly deals with annotated data rather than raw data. This contribution investigates the status of annotated corpus data in empirical linguistics.We argue that annotators should be regarded as co-producers of data; annotations depend on certain theoretical categories, hence they are theory-laden. Annotation categories differ with respect to different (structural and functional) levels of description and different degrees of canonisation, e.g. annotating a corpus item as a noun at a structural level is a highly canonised decision in most cases whereas the allocation of a cognitive-functional annotation category like expression with identifyable referent is subject to specific theories that often lack established definitions. As a minimal requirement, annotated data have to allow the reconstruction of the original raw data and annotations should be constrained by guidelines in order to avoid that the annotator’s decisions are arbitrary.Annotation problems resulting from the close relation between annotation categories and their theoretical prerequisites are exemplified using a newspaper corpus study and a study on a second-language acquisition corpus, both studies dealing with anaphora as a discourse-functional phenomenon.It is shown that the problems discussed have their origins in two circles: the first one results from the interplay of deductive and inductive procedures that causes an impact of theory on annotation; the second circle originates from the relations between language structures and their discourse functions, the latter failing to be observable independently from the structural features of the utterance.


► Annotation is a production rather than a documentation of corpus data.
► Functional categories cannot be annotated independently from structural features of the utterance.
► It should be transparent which theoretical assumptions have an impact on the annotations.
► Circularity: a theory is tested by means of data that have been generated on the basis of this theory.
► Annotation possibilities have to be constrained by heuristics based on plausibility.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Language Sciences - Volume 34, Issue 6, November 2012, Pages 702–714
نویسندگان
, ,