Automated Detection of Human Users in Twitter

Article ID	Journal	Published Year	Pages	File Type
484832	Procedia Computer Science	2015	8 Pages	PDF

Abstract

This paper compares Suppport Vector Machine (SVM) classification and a number of clustering approaches to separate human from not human users in Twitter in order to identify normal human activity. These approaches have similar F1 accuracy scores of 90% with both experienc- ing difficulties in classifying human users behaving abnormally. A second stage classification step was then used to further separate not human users into brands, celebrities and promoters / information achieving an average F1 accuracy of 74%. These accuracies were achieved by reducing the size of the feature space using stepwise feature selection and category balancing from manual inspection of classification results.