Automatic gender identification of author of Russian text by machine learning and neural net algorithms in case of gender deception

Article ID	Journal	Published Year	Pages	File Type
6900929	Procedia Computer Science	2018	7 Pages	PDF

Abstract

We present the analysis of approaches to solve an author gender identification task for Russian-language texts with gender deception, using different Data-Driven models based on conventional machine learning (Support Vector Classifier, Decision Tree, Gradient Boosting) and neuronet algorithms (convolutional layers, long short-term memory layers, etc.) The source of training and testing data are collections of texts from the Gender Imitation corpus, expanded by crowd-sourcing and supplemented with files of RusProfiling and RusPersonality corpora. The reached accuracy of this task milestone is presented and discussed.

Keywords

Deep neural networks Natural Language Processing Machine learning