کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
493365 721690 2012 5 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Extracting Collocations from Bengali Text Corpus
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Extracting Collocations from Bengali Text Corpus
چکیده انگلیسی

Automatic collocation extraction is very important in various applications in the field of natural language processing such as machine translation, word sense disambiguation, information retrieval, and language modelling in speech processing, lexicography and many more. The success of extracting collocations depends on the technique of preprocessing. A systematic pre-processing technique is described in this paper. Then the pre-processed data is used to extract collocation by using two methods: Point-wise Mutual Information and Fuzzy Bi-gram Index. The paper mainly focuses on bi-gram extraction from a Bengali news corpus. Collocations of higher length i.e., n-grams (n>2) are then obtained when the extracted collocations of lower lengths are treated as individual words.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Technology - Volume 4, 2012, Pages 325-329