کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
518408 867586 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A controlled greedy supervised approach for co-reference resolution on clinical text
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
A controlled greedy supervised approach for co-reference resolution on clinical text
چکیده انگلیسی

Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F1 score of 0.895, calculated from multiple evaluation metrics (MUC, B3 and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, Problem: 0.855, Treatment: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric.

Figure optionsDownload high-quality image (73 K)Download as PowerPoint slideHighlights
• Co-reference resolution detects which mentions refer to the same entity in a text.
• Our proposed approach obtains an average F1 score of 0.895 on the i2b2/VA 2011 data.
• Unlike other studies, it is not biased towards any particular metric or chain type.
• It exploits linguistic and semantic constraints for filtering non-co-referent pairs.
• It uses aggressive greedy strategies during chain clustering from co-referent pairs.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biomedical Informatics - Volume 46, Issue 3, June 2013, Pages 506–515
نویسندگان
, ,