کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515484 867026 2010 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Annotation and verification of sense pools in OntoNotes
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Annotation and verification of sense pools in OntoNotes
چکیده انگلیسی

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 46, Issue 4, July 2010, Pages 436–447
نویسندگان
, , , , ,