Article ID Journal Published Year Pages File Type
381717 Engineering Applications of Artificial Intelligence 2007 11 Pages PDF
Abstract

The proportion of successful hits, usually referred to as “accuracy”, is by far the most dominant meter for measuring classifiers’ accuracy. This is despite of the fact that accuracy does not compensate for hits that can be attributed to mere chance. Is it a meaningful flaw in the context of machine learning? Are we using the wrong meter for decades? The results of this study do suggest that the answers to these questions are positive.Cohen's kappa, a meter that does compensate for random hits, was compared with accuracy, using a benchmark of fifteen datasets and five well-known classifiers. It turned out that the average probability of a hit being the result of mere chance exceeded one third (!). It was also found that the proportion of random hits varied with different classifiers that were applied even to a single dataset. Consequently, the rankings of classifiers’ accuracy, with and without compensation for random hits, differed from each other in eight out of the fifteen datasets. Therefore, accuracy may well fail in its main task, namely to properly measure the accuracy-wise merits of the classifiers themselves.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
,