کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6902092 1446498 2017 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Automatic minimal diacritization of Arabic texts
ترجمه فارسی عنوان
ضمایم رسمی به صورت خودکار از متون عربی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی
Modern Standard Arabic (MSA) is typically written without short vowels, which helps in clarifying the sense and meaning of the word. The short vowels are omitted since experienced Arabic readers can infer the meaning through the context. But there are cases where even the native Arabic speakers cannot resolve. The process of restoring the diacritical marks (short vowels) is known as diacritization. Most of the developed algorithms for diacritization fully restores all the markings, many of which are trivial or unnecessary. In this paper, we present a system that restores the diacritical markings where it is mostly needed, resolving the ambiguity. This is a more challenging problem than fully restoring all the diacritics. The system combines morphological analyzers and context similarities. The goal of the morphological analyzers is to generate all word candidates for the diacritics, and the model eliminates word ambiguity through a statistical approach and context similarities. Out of 80 paragraphs our system resolved 57 cases.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 117, 2017, Pages 169-174
نویسندگان
, ,