کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
978756 933303 2011 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Ranking of predictor variables based on effect size criterion provides an accurate means of automatically classifying opinion column articles
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات فیزیک ریاضی
پیش نمایش صفحه اول مقاله
Ranking of predictor variables based on effect size criterion provides an accurate means of automatically classifying opinion column articles
چکیده انگلیسی

We demonstrate an accurate procedure based on linear discriminant analysis that allows automatic authorship classification of opinion column articles. First, we extract the following stylometric features of 157 column articles from four authors: statistics on high frequency words, number of words per sentence, and number of sentences per paragraph. Then, by systematically ranking these features based on an effect size criterion, we show that we can achieve an average classification accuracy of 93% for the test set. In comparison, frequency size based ranking has an average accuracy of 80%. The highest possible average classification accuracy of our data merely relying on chance is ∼31%. By carrying out sensitivity analysis, we show that the effect size criterion is superior than frequency ranking because there exist low frequency words that significantly contribute to successful author discrimination. Consistent results are seen when the procedure is applied in classifying the undisputed Federalist papers of Alexander Hamilton and James Madison. To the best of our knowledge, the work is the first attempt in classifying opinion column articles, that by virtue of being shorter in length (as compared to novels or short stories), are more prone to over-fitting issues. The near perfect classification for the longer papers supports this claim. Our results provide an important insight on authorship attribution that has been overlooked in previous studies: that ranking discriminant variables based on word frequency counts is not necessarily an optimal procedure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Physica A: Statistical Mechanics and its Applications - Volume 390, Issue 1, 1 January 2011, Pages 110–119
نویسندگان
, , ,