کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10355217 867112 2005 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Character contiguity in N-gram-based word matching: the case for Arabic text searching
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Character contiguity in N-gram-based word matching: the case for Arabic text searching
چکیده انگلیسی
This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 41, Issue 4, July 2005, Pages 819-827
نویسندگان
,