Article ID Journal Published Year Pages File Type
1109632 Procedia - Social and Behavioral Sciences 2015 7 Pages PDF
Abstract

Electronic texts from emails, social networks or mobile phones are currently of interest in Forensic Linguistics. Many of these texts analyzed are well under 200 words long. This work aims at identifying text authorship by using part-of-speech tags over short texts. Our corpus consists of 28 texts taken from forum messages. The tokens of our corpora were annotated with parts of speech (POS) provided by TreeTagger. A frequency vector based POS features was created and the Euclidean distance among texts was calculated. Results show how 10 out of the 14 (71, 4%) test texts were correctly assigned to their author.

Related Topics
Social Sciences and Humanities Arts and Humanities Arts and Humanities (General)