Article ID Journal Published Year Pages File Type
4961305 Procedia Computer Science 2016 8 Pages PDF
Abstract

In the present article, we address the problem of automatic text classification according to the author's gender. We used a preexisting corpus of Russian-language texts RusPersonality labeled with information on their authors (gender, age, psychological testing and so on). We performed the comparative study of machine learning techniques for gender attribution in Russian-language texts after deliberately removing gender bias in topics and genre. The obtained models of classifying Russian texts by their authors' gender demonstrate accuracy close to the state-of-the-art and even higher (up to 0.86 +/-0.03 in Accuracy, 86% in F1-score).

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, , , , ,