Minimally-supervised extraction of domain-specific part–whole relations using Wikipedia as knowledge-base

Article ID	Journal	Published Year	Pages	File Type
378809	Data & Knowledge Engineering	2013	23 Pages	PDF

Abstract

We present a minimally-supervised approach for learning part–whole relations from texts. Unlike previous techniques, we focused on sparse, domain-specific texts. The novelty in our approach lies in the use of Wikipedia as a knowledge-base, from which we first acquire a set of reliable patterns that express part–whole relations. This is achieved by a minimally-supervised algorithm. We then use the patterns acquired to extract part–whole relation triples from a collection of sparse, domain-specific texts. Our strategy, of learning in one domain and applying the knowledge in another domain is based upon the notion of domain-adaption. It allows us to overcome the challenges of learning the relations directly from the sparse, domain-specific corpus. Our experimental evaluations reveal that, despite its general-purpose nature, Wikipedia can be exploited as a source of knowledge for improving the performance of domain-specific part–whole relation extraction. As our other contributions, we propose a mechanism that mitigates the negative impact of semantic-drift on minimally-supervised algorithms. Also, we represent the patterns in the extracted relations using sophisticated syntactic structures that avoid the limitations of traditional surface string representations. In addition, we show that domain-specific part–whole relations cannot be conclusively classified in existing taxonomies.

Keywords

Relation extraction Knowledge management applications Text mining Ontology learning