Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
554733 | Decision Support Systems | 2014 | 16 Pages |
•We present a semantic approach for learning domain taxonomies from text.•Word sense disambiguation is applied on text and on existing taxonomies.•We refine the subsumption method for term relations to include concept semantics.•We define new semantic measures for evaluating the built taxonomies.•Our method performs well for capturing the broader–narrower inter-concept relation.
In this paper we present a framework for the automatic building of a domain taxonomy from text corpora, called Automatic Taxonomy Construction from Text (ATCT). This framework comprises four steps. First, terms are extracted from a corpus of documents. From these extracted terms the ones that are most relevant for a specific domain are selected using a filtering approach in the second step. Third, the selected terms are disambiguated by means of a word sense disambiguation technique and concepts are generated. In the final step, the broader–narrower relations between concepts are determined using a subsumption technique that makes use of concept co-occurrences in a text. For evaluation, we assess the performance of the ATCT framework using the semantic precision, semantic recall, and the taxonomic F-measure that take into account the concept semantics. The proposed framework is evaluated in the field of economics and management as well as the medical domain.