کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
402892 | 677025 | 2011 | 12 صفحه PDF | دانلود رایگان |

Research on the extraction of content relations from text corpora is a high-priority topic in natural language processing. This is not surprising since content relations form the backbone of any ontology, and ontologies are increasingly made use of in knowledge-based applications. However, so far most of the works focus on the detection of a restricted number of prominent verbal relations, including in particular is-a, has-part and cause. Our application, which aims to provide comprehensive, easy-to-understand content representations of complex functional objects described in patent claims, faces the need to derive a large number of content relations that cannot be limited a priori. To cope with this problem, we take advantage of the fact that deep syntactic dependency structures of sentences capture all relevant content relations—although without any abstraction. We implement thus a three-step strategy. First, we parse the claims to retrieve the deep syntactic dependency structures from which we then derive the content relations. Second, we generalize the obtained relations by clustering them according to semantic criteria, with the goal to unite all sufficiently similar relations. Finally, we identify a suitable name for each generalized relation. To keep the scope of the article within reasonable limits and to allow for a comparison with state-of-the-art techniques, we focus on verbal relations.
Journal: Knowledge-Based Systems - Volume 24, Issue 8, December 2011, Pages 1233–1244