Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4961305 | Procedia Computer Science | 2016 | 8 Pages |
Abstract
In the present article, we address the problem of automatic text classification according to the author's gender. We used a preexisting corpus of Russian-language texts RusPersonality labeled with information on their authors (gender, age, psychological testing and so on). We performed the comparative study of machine learning techniques for gender attribution in Russian-language texts after deliberately removing gender bias in topics and genre. The obtained models of classifying Russian texts by their authors' gender demonstrate accuracy close to the state-of-the-art and even higher (up to 0.86 +/-0.03 in Accuracy, 86% in F1-score).
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science (General)
Authors
Aleksandr Sboev, Tatiana Litvinova, Dmitry Gudovskikh, Roman Rybka, Ivan Moloshnikov,