کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
557995 | 1451694 | 2006 | 16 صفحه PDF | دانلود رایگان |
Lexico-semantic collocations (LSCs) are a prominent type of multiword expressions. Over the last decade, the automatic compilation of LSCs from text corpora has been addressed in a significant number of works. However, very often, the output of an LSC-extraction program is a plain list of LSCs. Being useful as raw material for dictionary construction, plain lists of LSCs are of a rather limited use in NLP-applications. For NLP, LSCs must be assigned syntactic and, especially, semantic information. Our goal is to develop an “off-the-shelf” LSC-acquisition program that annotates each LSC identified in the corpus with its syntax and semantics. In this article, we address the annotation task as a classification task,viewing it as a machine learning problem. The LSC-typology we use are the lexical functions from the Explanatory Combinatorial Lexicology; as lexico-semantic resource, EuroWordnet has been used. The applied machine learning technique is a variant of the nearest neighbor-family, which is defined over lexico-semantic features of the elements of LSCs. The technique has been tested on Spanish verb–noun bigrams.
Journal: Computer Speech & Language - Volume 20, Issue 4, October 2006, Pages 609–624