کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4955692 1444323 2017 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Author gender identification from Arabic text
ترجمه فارسی عنوان
شناسایی شناسه جنسیت از متن عربی
کلمات کلیدی
پردازش متن عربی، شناسایی جنسیت، ویژگی های استیلومتری، کیسه ای از کلمات،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
چکیده انگلیسی
The Gender Identification (GI) problem is concerned with determining the gender of a given text's author. It has a wide range of academic/commercial applications in various fields including literature, security, forensics, electronic markets and trading, etc. To address this problem, researchers have proposed that the writing styles of authors of the same gender share certain aspects, which can be captured by certain stylometric features (SF). Another approach to address this problem focuses mainly on keywords occurrences in each document. This is known as the Bag-Of-Words (BOW) approach. In this work, we study and compare both approaches and focus on the Arabic language for which this problem is still largely understudied despite its importance. To the best of our knowledge, no previous work has considered these approaches for the GI problem of Arabic text. The comparison is carried out under different settings and the results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings. In fact, the best accuracy levels obtained by the SF and BOW approaches on our in-house dataset are 80.4% and 73.9%, respectively.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Information Security and Applications - Volume 35, August 2017, Pages 85-95
نویسندگان
, , , ,