کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10358522 868524 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods
ترجمه فارسی عنوان
شناسایی اشخاص از نشریات علمی: مقایسه روش های مبتنی بر واژگان و مدل
کلمات کلیدی
استخراج شخصیت، واژگان، فرهنگ لغت، زمینه های تصادفی محض، محتوا آگاه،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
The objective of this study is to evaluate the performance of five entity extraction methods for the task of identifying entities from scientific publications, including two vocabulary-based methods (a keyword-based and a Wikipedia-based) and three model-based methods (conditional random fields (CRF), CRF with keyword-based dictionary, and CRF with Wikipedia-based dictionary). These methods are applied to an annotated test set of publications in computer science. Precision, recall, accuracy, area under the ROC curve, and area under the precision-recall curve are employed as the evaluative indicators. Results show that the model-based methods outperform the vocabulary-based ones, among which CRF with keyword-based dictionary has the best performance. Between the two vocabulary-based methods, the keyword-based one has a higher recall and the Wikipedia-based one has a higher precision. The findings of this study help inform the understanding of informetric research at a more granular level.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Informetrics - Volume 9, Issue 3, July 2015, Pages 455-465
نویسندگان
, ,