کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
518219 867566 2013 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Approaches to verb subcategorization for biomedicine
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Approaches to verb subcategorization for biomedicine
چکیده انگلیسی

Information about verb subcategorization frames (SCFs) is important to many tasks in natural language processing (NLP) and, in turn, text mining. Biomedicine has a need for high-quality SCF lexicons to support the extraction of information from the biomedical literature, which helps biologists to take advantage of the latest biomedical knowledge despite the overwhelming growth of that literature. Unfortunately, techniques for creating such resources for biomedical text are relatively undeveloped compared to general language. This paper serves as an introduction to subcategorization and existing approaches to acquisition, and provides motivation for developing techniques that address issues particularly important to biomedical NLP. First, we give the traditional linguistic definition of subcategorization, along with several related concepts. Second, we describe approaches to learning SCF lexicons from large data sets for general and biomedical domains. Third, we consider the crucial issue of linguistic variation between biomedical fields (subdomain variation). We demonstrate significant variation among subdomains, and find the variation does not simply follow patterns of general lexical variation. Finally, we note several requirements for future research in biomedical SCF lexicon acquisition: a high-quality gold standard, investigation of different definitions of subcategorization, and minimally-supervised methods that can learn subdomain-specific lexical usage without the need for extensive manual work.

Figure optionsDownload high-quality image (67 K)Download as PowerPoint slideHighlights
► Verb subcategorization (SCF) is an important phenomenon in biomedical text mining.
► Two state-of-the-art systems for SCF acquisition have limitations.
► There is significant variation in SCF behavior between biomedical subdomains.
► These facts point to a need for new, less supervised acquisition methods.
► A key first step is to build a biomedicine gold standard for evaluation.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 46, Issue 2, April 2013, Pages 212–227
نویسندگان
, , , ,