A biclustering approach for classification with mislabeled data

Article ID	Journal	Published Year	Pages	File Type
382113	Expert Systems with Applications	2015	11 Pages	PDF

Abstract

•We propose a biclustering approach (BicNoise) for coping with mislabeled data.•We assess three variants of BicNoise on binary classification problems.•The proposed strategy is mostly useful when high levels of noise take place.

Labeling samples on large data sets is a demanding task prone to different sources of errors. Those errors, denoted as noise, can significantly impact the performance of a classification algorithm due to overfitting of wrongly labeled data. So far, this problem has been treated by avoiding the overfitting and correcting mislabeled data through similarity analysis. The former approach can be affected by the curse of dimensionality and some mislabeled data will not be corrected. In this paper, we investigate the use of a biclustering approach to capture local models of coherence across subsets of instances and attributes. Those models are used to replace and augment the attributes of the original dataset. Through a systematic series of experiments, we have assessed the performance of the proposed approach, referred to as BicNoise, by considering different rates and types of label noise, and also different types of classifiers, binary datasets, and evaluation metrics. The good results achieved suggest that the transformed data can alleviate the dimensionality problem, reduce the redundancy of correlated features and improve the separability of the data, thus improving the classifier performance (most noticeably, in the highest noise settings).

Keywords

Biclustering Classification Feature learning