کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960370 1364896 2017 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Morphological, syntactic and diacritics rules for automatic diacritization of Arabic sentences
ترجمه فارسی عنوان
قوانین مورفولوژیک، نحوی و دیار کریستال برای تعریف خودکار احکام عربی
کلمات کلیدی
زبان عربی، تشخیص خودکار علامت های دیواری عربی، تجزیه و تحلیل مورفولوژیکی، تکنیک های صاف کردن مدل مخفی مارکف،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

The diacritical marks of Arabic language are characters other than letters and are in the majority of cases absent from Arab writings. This paper presents a hybrid system for automatic diacritization of Arabic sentences combining linguistic rules and statistical treatments. The used approach is based on four stages. The first phase consists of a morphological analysis using the second version of the morphological analyzer Alkhalil Morpho Sys. Morphosyntactic outputs from this step are used in the second phase to eliminate invalid word transitions according to the syntactic rules. Then, the system used in the third stage is a discrete hidden Markov model and Viterbi algorithm to determine the most probable diacritized sentence. The unseen transitions in the training corpus are processed using smoothing techniques. Finally, the last step deals with words not analyzed by Alkhalil analyzer, for which we use statistical treatments based on the letters. The word error rate of our system is around 2.58% if we ignore the diacritic of the last letter of the word and around 6.28% when this diacritic is taken into account.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of King Saud University - Computer and Information Sciences - Volume 29, Issue 2, April 2017, Pages 156-163
نویسندگان
, ,