کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382113 660737 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A biclustering approach for classification with mislabeled data
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A biclustering approach for classification with mislabeled data
چکیده انگلیسی


• We propose a biclustering approach (BicNoise) for coping with mislabeled data.
• We assess three variants of BicNoise on binary classification problems.
• The proposed strategy is mostly useful when high levels of noise take place.

Labeling samples on large data sets is a demanding task prone to different sources of errors. Those errors, denoted as noise, can significantly impact the performance of a classification algorithm due to overfitting of wrongly labeled data. So far, this problem has been treated by avoiding the overfitting and correcting mislabeled data through similarity analysis. The former approach can be affected by the curse of dimensionality and some mislabeled data will not be corrected. In this paper, we investigate the use of a biclustering approach to capture local models of coherence across subsets of instances and attributes. Those models are used to replace and augment the attributes of the original dataset. Through a systematic series of experiments, we have assessed the performance of the proposed approach, referred to as BicNoise, by considering different rates and types of label noise, and also different types of classifiers, binary datasets, and evaluation metrics. The good results achieved suggest that the transformed data can alleviate the dimensionality problem, reduce the redundancy of correlated features and improve the separability of the data, thus improving the classifier performance (most noticeably, in the highest noise settings).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 42, Issue 12, 15 July 2015, Pages 5065–5075
نویسندگان
, ,