کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4961305 1446514 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features
چکیده انگلیسی

In the present article, we address the problem of automatic text classification according to the author's gender. We used a preexisting corpus of Russian-language texts RusPersonality labeled with information on their authors (gender, age, psychological testing and so on). We performed the comparative study of machine learning techniques for gender attribution in Russian-language texts after deliberately removing gender bias in topics and genre. The obtained models of classifying Russian texts by their authors' gender demonstrate accuracy close to the state-of-the-art and even higher (up to 0.86 +/-0.03 in Accuracy, 86% in F1-score).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 101, 2016, Pages 135-142
نویسندگان
, , , , ,