کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392405 664768 2015 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering
چکیده انگلیسی


• Noisy and borderline examples in imbalanced datasets harm classifier performance.
• Our proposal reduces the noise and makes the class boundaries more regular.
• Our proposal performs better than other re-sampling methods in this scenario.
• Ensemble-based noise filters work well when dealing with noise.
• The iterative noise elimination is a good approach to deal with noisy datasets.

Classification datasets often have an unequal class distribution among their examples. This problem is known as imbalanced classification. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most well-know data pre-processing methods to cope with it and to balance the different number of examples of each class. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data. One of these is the presence of noisy and borderline examples, the latter lying in the areas surrounding class boundaries. Certain intrinsic limitations of SMOTE can aggravate the problem produced by these types of examples and current generalizations of SMOTE are not correctly adapted to their treatment.This paper proposes the extension of SMOTE through a new element, an iterative ensemble-based noise filter called Iterative-Partitioning Filter (IPF), which can overcome the problems produced by noisy and borderline examples in imbalanced datasets. This extension results in SMOTE–IPF. The properties of this proposal are discussed in a comprehensive experimental study. It is compared against a basic SMOTE and its most well-known generalizations. The experiments are carried out both on a set of synthetic datasets with different levels of noise and shapes of borderline examples as well as real-world datasets. Furthermore, the impact of introducing additional different types and levels of noise into these real-world data is studied. The results show that the new proposal performs better than existing SMOTE generalizations for all these different scenarios. The analysis of these results also helps to identify the characteristics of IPF which differentiate it from other filtering approaches.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 291, 10 January 2015, Pages 184–203
نویسندگان
, , , ,