Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10998001 | Information Processing & Management | 2018 | 13 Pages |
Abstract
In this work, we present the first quality flaw prediction study for articles containing the two most frequent verifiability flaws in Spanish Wikipedia: articles which do not cite any references or sources at all (denominated Unreferenced) and articles that need additional citations for verification (so-called Refimprove). Based on the underlying characteristics of each flaw, different state-of-the-art approaches were evaluated. For articles not citing any references, a well-established rule-based approach was evaluated and interesting findings show that some of them suffer from Refimprove flaw instead. Likewise, for articles that need additional citations for verification, the well-known PU learning and one-class classification approaches were evaluated. Besides, new methods were compared and a new feature was also proposed to model this latter flaw. The results showed that new methods such as under-bagged decision trees with sum or majority voting rules, biased-SVM, and centroid-based balanced SVM, perform best in comparison with the ones previously published.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Edgardo Ferretti, Leticia Cagnina, Viviana Paiz, Sebastián Delle Donne, Rodrigo Zacagnini, Marcelo Errecalde,