Article ID Journal Published Year Pages File Type
4961804 Procedia Computer Science 2016 10 Pages PDF
Abstract
This paper discusses a novel time series methodology for writing process modeling, taking into account the dependency between sequentially written text parts. A series of consecutive sub-documents of a given document are represented via histograms of the appropriately chosen terms. To characterize the document overall style and its fluctuations, a new feature named the Mean Dependence is introduced. This similarity measure quantifies the association between a current sub-document and numerous earlier composed ones. So, such a collection of sub-documents is represented as a time series of the Mean Dependence development. The series change points naturally link to the style changes. Two possible approaches constructed within the general methodology are discussed. The first one intended to study media sources, is constructed to detect change points of media associated with social life transformations. Consequently, the homogeneous periods are detected using a new distance based on the Mean Dependence. The proposed methodology is applied to analysis of editorial texts published in the Egyptian “Al-Ahraam” and succeeds to indicate several important events connected to the “Arab Spring”. The second approach, based on the strictly stationary model of time series, is applied to authorship verification. Numerical experiments demonstrate high ability of the proposed methods to recognize an authorship and to expose writing style evolution.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,