کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515570 867045 2013 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Authorship attribution based on a probabilistic topic model
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Authorship attribution based on a probabilistic topic model
چکیده انگلیسی

This paper describes, evaluates and compares the use of Latent Dirichlet allocation (LDA) as an approach to authorship attribution. Based on this generative probabilistic topic model, we can model each document as a mixture of topic distributions with each topic specifying a distribution over words. Based on author profiles (aggregation of all texts written by the same writer) we suggest computing the distance with a disputed text to determine its possible writer. This distance is based on the difference between the two topic distributions. To evaluate different attribution schemes, we carried out an experiment based on 5408 newspaper articles (Glasgow Herald) written by 20 distinct authors. To complement this experiment, we used 4326 articles extracted from the Italian newspaper La Stampa and written by 20 journalists. This research demonstrates that the LDA-based classification scheme tends to outperform the Delta rule, and the χ2 distance, two classical approaches in authorship attribution based on a restricted number of terms. Compared to the Kullback–Leibler divergence, the LDA-based scheme can provide better effectiveness when considering a larger number of terms.


► We propose to use the latent Dirichlet allocation (LDA) as authorship attribution model.
► Experiments were based on an English and Italian newspaper corpus.
► The LDA model produces better accuracy than the Delta or the chi-square approach.
► But similar performances than the Kullback–Leibler divergence (KLD) or the naïve Bayes approach.
► Examples of LDA as explanatory tool are also given.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 49, Issue 1, January 2013, Pages 341–354
نویسندگان
,