Article ID Journal Published Year Pages File Type
536153 Pattern Recognition Letters 2016 8 Pages PDF
Abstract

•Noisy data decreases the classification accuracy of the induced classifier.•Accuracy improved by eliminating the noisy instances from the dataset.•Partial Instance Reduction (PIR) gave better accuracy than complete instance reduction.•The new PIR methods make use for some valuable information in the noisy instance.•The new PIR methods will be more useful especially in scarce datasets.

Real-world data are usually noisy, causing many machine-learning algorithms to overfit their data. Various Instance Reduction (IR) techniques have been proposed to filter out noisy instances and clean the data. This paper presents Partial Instance Reduction (PIR) or partial outlier elimination techniques. Unlike IR techniques, which eliminate all suspicious instances, PIR techniques partially eliminate a suspicious instance by eliminating some of its attribute values. If this fails to change the status of an instance from outlier to normal, the entire instance is eliminated. The main advantage of partial elimination is that it allows us to retain significant parts of instances, which is particularly useful when the training data is scarce. This paper compares PIR and IR techniques using 50 benchmark data sets, both with and without noise. Our empirical results show that PIR techniques significantly outperform the IR techniques on many benchmark datasets. Whereas IR techniques eliminate a large number of instances that are not outliers, PIR techniques manage to save parts of these instances that are useful for classification.

Graphical abstractFigure optionsDownload full-size imageDownload high-quality image (88 K)Download as PowerPoint slide

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,