کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4962170 1446526 2016 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-words Lists of International Languages
ترجمه فارسی عنوان
بر اساس مقیاس های آماری مبتنی بر تقسیم بندی قاره و اسکریپت برای لیست های متداول زبان های بین المللی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

The data for the current research work was collected for 42 different International languages encompassing 3 continents viz. Asia, Europe and South America. The data comprised of unigram model representation of lexicons in the stop-words lists. 13 scripting systems comprising Arabic, Armenian, Bengali, Chinese, Cyrillic, Devanagari, Greek, Gurmukhi, Hanja & Hangul, Kana, Kanji, Marathi, Roman (Latin) and Thai were considered. Based on a comprehensive analysis of statistical measures for Stop-words lists, it has been concluded that Asian languages are mostly self-scripted and that the average number of stop-words in Asian languages is more than those in European languages. In addition to various important and other first research results, a very important inference from the current research work is that the average number of stop-words for any given language could be predicted to be 200.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 89, 2016, Pages 313-319
نویسندگان
, ,