کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4961804 1446519 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Text Classification Using a Novel Time Series based Methodology
ترجمه فارسی عنوان
طبقه بندی متن با استفاده از متدولوژی مبتنی بر سری رمان
کلمات کلیدی
طبقه بندی متن، مدل متن سری زمانی، تایید هویت،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی
This paper discusses a novel time series methodology for writing process modeling, taking into account the dependency between sequentially written text parts. A series of consecutive sub-documents of a given document are represented via histograms of the appropriately chosen terms. To characterize the document overall style and its fluctuations, a new feature named the Mean Dependence is introduced. This similarity measure quantifies the association between a current sub-document and numerous earlier composed ones. So, such a collection of sub-documents is represented as a time series of the Mean Dependence development. The series change points naturally link to the style changes. Two possible approaches constructed within the general methodology are discussed. The first one intended to study media sources, is constructed to detect change points of media associated with social life transformations. Consequently, the homogeneous periods are detected using a new distance based on the Mean Dependence. The proposed methodology is applied to analysis of editorial texts published in the Egyptian “Al-Ahraam” and succeeds to indicate several important events connected to the “Arab Spring”. The second approach, based on the strictly stationary model of time series, is applied to authorship verification. Numerical experiments demonstrate high ability of the proposed methods to recognize an authorship and to expose writing style evolution.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 96, 2016, Pages 53-62
نویسندگان
, ,