Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6865916 | Neurocomputing | 2015 | 9 Pages |
Abstract
This paper proposes a methodology for identifying data samples that are likely to be mislabeled in a c-class classification problem (dataset). The methodology relies on an assumption that the generalization error of a model learned from the data decreases if a label of some mislabeled sample is changed to its correct class. A general classification model used in the paper is OP-ELM; it also provides a fast way to estimate the generalization error by PRESS Leave-One-Out. It is tested on two toy datasets, as well as on real life datasets for one of which expert knowledge about the identified potential mislabels has been sought.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Anton Akusok, David Veganzones, Yoan Miche, Kaj-Mikael Björk, Philippe du Jardin, Eric Severin, Amaury Lendasse,