Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6861439 | Knowledge-Based Systems | 2018 | 56 Pages |
Abstract
In this paper we present CrumbTrail, an algorithm to clean large and dense knowledge graphs. CrumbTrail removes cycles, out-of-domain nodes and non-essential nodes, i.e., those that can be safely removed without breaking the knowledge graph's connectivity. It achieves this through a bottom-up topological pruning on the basis of a set of input concepts that, for instance, a user can select in order to identify a domain of interest. Our technique can be applied to both noisy hypernymy graphs - typically generated by ontology learning algorithms as intermediate representations - as well as crowdsourced resources like Wikipedia, in order to obtain clean, domain-focused concept hierarchies. CrumbTrail overcomes the time and space complexity limitations of current state-of-art algorithms. In addition, we show in a variety of experiments that it also outperforms them in tasks such as pruning automatically acquired taxonomy graphs, and domain adaptation of the Wikipedia category graph.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto, Paola Velardi,