کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
561748 1451607 2016 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
LHD 2.0: A text mining approach to typing entities in knowledge graphs
ترجمه فارسی عنوان
LHD 2.0: یک رویکرد متن کاوی برای نوع بندی اشخاص در نمودارهای دانش
کلمات کلیدی
نوع استنتاج؛ پشتیبانی ماشین آلات بردار. طبقه بندی نهاد؛ DBPedia
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر سیستم های اطلاعاتی
چکیده انگلیسی

The type of the entity being described is one of the key pieces of information in linked data knowledge graphs. In this article, we introduce a novel technique for type inference that extracts types from the free text description of the entity combining lexico-syntactic pattern analysis with supervised classification. For lexico-syntactic (Hearst) pattern-based extraction we use our previously published Linked Hypernyms Dataset Framework. Its output is mapped to the DBpedia Ontology with exact string matching complemented with a novel co-occurrence-based algorithm STI. This algorithm maps classes appearing in one knowledge graph to a different set of classes appearing in another knowledge graph provided that the two graphs contain common set of typed instances. The supervised results are obtained from a hierarchy of Support Vector Machines classifiers (hSVM) trained on the bag-of-words representation of short abstracts and categories of Wikipedia articles. The results of both approaches are probabilistically fused. For evaluation we created a gold-standard dataset covering over 2000 DBpedia entities using a commercial crowdsourcing service. The hierarchical precision of our hSVM and STI approaches is comparable to SDType, the current state-of-the-art type inference algorithm, while the set of applicable instances is largely complementary to SDType as our algorithms do not require semantic properties in the knowledge graph to type an instance. The paper also provides a comprehensive evaluation of type assignment in DBpedia in terms of hierarchical precision, recall and exact match with the gold standard. Dataset generated by a version of the presented approach is included in DBpedia 2015.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Web Semantics: Science, Services and Agents on the World Wide Web - Volume 39, August 2016, Pages 47–61
نویسندگان
, ,