کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4973695 1451684 2017 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Multi-domain evaluation framework for named entity recognition tools
ترجمه فارسی عنوان
چارچوب ارزیابی چند دامنه برای ابزارهای تشخیص موجودیت نام
کلمات کلیدی
شناسایی نام ارزیابی چند دامنه، تجزیه و تحلیل داده های کیفی، ارزیابی معیار،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی
Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 43, May 2017, Pages 34-55
نویسندگان
, , ,