کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6902019 1446498 2017 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A collocation extraction tool and two language resources for MSA
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
A collocation extraction tool and two language resources for MSA
چکیده انگلیسی
Collocation extraction from corpora, whether complete or according to specific criteria, plays a significant role in computational linguistics, corpus linguistics, and natural language processing. In this paper, we present Musaheb, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. Musaheb can extract n-gram collocations (for n ≤ 5) in addition to extracting the collocates of specific word types within a window size of zero to 15 words. Moreover, it provides eight collocation statistics that may be used to calculate collocation strength, and permits the input of various constraints during node selection (that is, the determination of the word or phrase whose collocates we wish to search for) and collocate extraction. Based on the user preferences for the node, concordance, and collocate selection, Musaheb saves all nodes and their associated collocates in an XML file, allowing easy conversion to different formats. Two language resources for Modern Standard Arabic developed via Musaheb are presented in this paper: 1) a 2-gram language model, and 2) a 2-gram collocation list based on an Arabic newspaper corpus comprising more than 20 million words.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 117, 2017, Pages 23-29
نویسندگان
, ,