Domain-specific language models and lexicons for tagging

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
10355820	867543	2005	9 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Domain adaptation - انطباق دامنه Corpus linguistics - زبانشناسی جسمانی Clinical information systems - سیستم های اطلاعات بالینی Hidden Markov model - مدل پنهان مارکوف

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Domain-specific language models and lexicons for tagging

چکیده انگلیسی

Accurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 38, Issue 6, December 2005, Pages 422-430

نویسندگان

Anni R. Coden, Serguei V. Pakhomov, Rie K. Ando, Patrick H. Duffy, Christopher G. Chute,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Domain-specific language models and lexicons for tagging

دسترسی سریع

ارتباط

English Website