Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6933969 | Journal of Informetrics | 2018 | 25 Pages |
Abstract
Using two publication databases and four test data sets we demonstrate the functionality of the framework and analyse the stability and discriminative power of the most common information retrieval evaluation measures. We find that there is no clear winner and that the performance of the evaluation measures is highly dependent on the underlying data. Our results show that the average rank is indeed an adequate and stable measure. However, we also show that relatively large performance differences are required to confidently determine if one ranking algorithm is significantly superior to another. Lastly, we list alternative measures that also yield stable results and highlight measures that should not be used in this context.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Marcel Dunaiski, Jaco Geldenhuys, Willem Visser,