Article ID Journal Published Year Pages File Type
4946149 Knowledge-Based Systems 2017 34 Pages PDF
Abstract
This work introduces a novel technique of extracting the main concepts from the text. Concepts are described by word-based connections disposed in a semantic topological space, built by the formal model, the simplicial complex. It links the points, i.e., the words appearing in the text and incrementally creates a geometrical structure, describing concepts that are more or less specialized, depending on the aggregation distance of words. The conceptual network is context-aware, since it reveals unambiguous concepts, specialized by the analysis of the surrounding text. The framework that implements the approach, discovers basic concepts, composed of minimal number of words useful to describe a finite sense concept, and richer extended concepts built adding further relations among terms. The final topological space provides a multi-granule concept representation: from a local, word-closeness view to a highly refined description. Experiments and comparative analysis validate the effectiveness of the approach, evidencing satisfactory performance in the concept identification, with precision values greater than 80% in the most of the experiments and the recall is on average, around 60-70% with peaks of 90% for some specific concept categories.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,