کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960638 1446503 2017 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A classification-based approach to the identification of Multiword Expressions (MWEs) in Magahi Applying SVM
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
A classification-based approach to the identification of Multiword Expressions (MWEs) in Magahi Applying SVM
چکیده انگلیسی

Multiword Expressions are crucial for any Natural Language Processing task as they frequently occur in any natural language. In addition, they “display a continuum of compositionality”. Although they have much frequency in informal spoken corpus, they are used less frequently in formal textual corpus. Multiword expressions in Magahi can provide a unique platform and a gateway to research into other less-resourced Indian languages in general and dialectal varieties of Hindi in particular. This is the very first research project of its kind undertaken in Magahi. In this study, we have applied Support Vector Machines classifier for automatic identification and classification of multiword expressions. For this purpose, we have applied a POS-annotated corpus of approximately 75k word tokens out of which 11k tokens are multiword expressions. The raw data applied in this study have been crawled and sanitized by Indian languages crawler known as IC Crawler and semi-automatically annotated by the ILCI annotation tool. The tagset adhered for annotation comprises of nine annotation labels as adapted from Singh et al. The Magahi multiword extractor achieves a combined overall precision accuracy of 81.57%.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 112, 2017, Pages 594-603
نویسندگان
, , ,