Developing a corpus of clinical notes manually annotated for part-of-speech

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
517382	1449218	2006	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Medical domain Domain adaptation - انطباق دامنه Text analysis - تجزیه و تحلیل متن Natural Language Processing - پردازش زبان‌های طبیعی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Developing a corpus of clinical notes manually annotated for part-of-speech

چکیده انگلیسی

PurposeThis paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech (POS) information. We describe and discuss the process of training three domain experts to perform linguistic annotation.MethodsThree domain experts were trained to perform manual annotation of a corpus of clinical notes. A part of this corpus was combined with the Penn Treebank corpus of general purpose English text and another part was set aside for testing. The corpora were then used for training and testing statistical part-of-speech taggers. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation.ResultsWe used the Trigrams‘n’Tags (TnT) [T. Brants, TnT—a statistical part-of-speech tagger, In: Proceedings of NAACL/ANLP-2000 Symposium, 2000] tagger trained on general English data to achieve 89.79% correctness. The same tagger trained on a portion of the medical data annotated for this project improved the performance to 94.69%. Furthermore, we find that discriminating between different types of discourse represented by different sections of clinical text may be very beneficial to improve correctness of POS tagging.ConclusionOur preliminary experimental results indicate the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of clinical text.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: International Journal of Medical Informatics - Volume 75, Issue 6, June 2006, Pages 418–429

نویسندگان

Serguei V. Pakhomov, Anni Coden, Christopher G. Chute,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Developing a corpus of clinical notes manually annotated for part-of-speech

دسترسی سریع

ارتباط

English Website