کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6900927 1446491 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation
چکیده انگلیسی
Building of sense-tagged data is a main challenge for supervised techniques that achieved promising results in word sense disambiguation. The manual building of sense-tagged data is a labor and a time-consuming task because each ambiguous word has to be labeled in collected contexts by linguistic experts. Therefore, this paper proposes a knowledge-based method for building the Arabic sense-tagged corpus from Wikipedia. The method starts with mapping Arabic WordNet and Wikipedia to select the Wikipedia article for the corresponding sense in WordNet. In this mapping step, the cross-lingual method is used to measure the similarity between features of a Wikipedia article and a WordNet sense separately. Then, the incoming-links of Wikipedia articles are exploited to extract instances for the sense of each ambiguous word in WordNet. For handling the lack of instances of some articles in Wikipedia, the multiword-based technique is proposed to increase a number of instances for each concept. Experimental results show that the cross-lingual method outperforms monolingual method that is based on Arabic features only. The sense-tagged corpus is created for 50 ambiguous words yielding 148 senses with 30,961 instances.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 123, 2018, Pages 403-412
نویسندگان
, , , ,