کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6856807 1437970 2018 25 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel approach for entity resolution in scientific documents using context graphs
ترجمه فارسی عنوان
یک رویکرد جدید برای حل مسئله در اسناد علمی با استفاده از نمودارهای متن
کلمات کلیدی
انتخاب ویژگی، رزومه شخصیت، نمودار مبتنی بر محتوا، ماشین آلات بردار پشتیبانی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Entity resolution refers to disambiguating and resolving entities in structured and unstructured data. Developments of effective resolution algorithms are significant for processing scientific documents, particularly for biomedical literature. Specifically, name ambiguity among biomedical entities is a primary task that needs to be solved in the knowledge extraction process. In this paper, we present a novel approach to disambiguating gene/protein names by using context graphs. A set of abstracts of documents is used to build the context graphs through disclosing the indirect co-occurrence relationships among words. Feature vectors of the graphs can be constructed according to information gain (IG) on the word set. To evaluate the IG values, we propose a new metrics that integrates the word frequency (WF), dispersion degree (DD) and concentration degree (CD). Finally, entity resolution is performed by applying a support vector machine (SVM). Compared to existing approaches, the proposed method is capable of discovering latent information from the context of entity names, rather than using some statistical information such as the number of occurrences of words. Based on the results from comprehensive experiments over two benchmark datasets, we conclude that our proposed method, compared to several existing solutions, for resolving ambiguity entities is promising.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 432, March 2018, Pages 431-441
نویسندگان
, , , , , ,