کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402515 676955 2012 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
چکیده انگلیسی

The present paper investigates the influence of both the imbalance ratio and the classifier on the performance of several resampling strategies to deal with imbalanced data sets. The study focuses on evaluating how learning is affected when different resampling algorithms transform the originally imbalanced data into artificially balanced class distributions. Experiments over 17 real data sets using eight different classifiers, four resampling algorithms and four performance evaluation measures show that over-sampling the minority class consistently outperforms under-sampling the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance. Results also indicate that the classifier has a very poor influence on the effectiveness of the resampling strategies.


► We study the influence of imbalance and classifier on resampling.
► Over-sampling performs better than under-sampling in severe imbalanced data sets.
► In moderate/low imbalance both over and under-sampling give similar performance.
► It is more important the way of preprocessing the data than the classifier used.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 25, Issue 1, February 2012, Pages 13–21
نویسندگان
, , ,