کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
530951 869802 2013 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling
چکیده انگلیسی


• A new ensemble solution for the class-imbalance problem is proposed.
• The accuracy of the base classifiers is enhanced by evolutionary undersampling.
• A diversity exploitation mechanism is developed, which is useful to deal with imbalance.
• EUSBoost outperforms the state-of-the-art ensembles in highly imbalanced data-sets.
• EUSBoost behavior is explained via kappa-AUC error diagrams.

Classification with imbalanced data-sets has become one of the most challenging problems in Data Mining. Being one class much more represented than the other produces undesirable effects in both the learning and classification processes, mainly regarding the minority class. Such a problem needs accurate tools to be undertaken; lately, ensembles of classifiers have emerged as a possible solution. Among ensemble proposals, the combination of Bagging and Boosting with preprocessing techniques has proved its ability to enhance the classification of the minority class.In this paper, we develop a new ensemble construction algorithm (EUSBoost) based on RUSBoost, one of the simplest and most accurate ensemble, which combines random undersampling with Boosting algorithm. Our methodology aims to improve the existing proposals enhancing the performance of the base classifiers by the usage of the evolutionary undersampling approach. Besides, we promote diversity favoring the usage of different subsets of majority class instances to train each base classifier. Centered on two-class highly imbalanced problems, we will prove, supported by the proper statistical analysis, that EUSBoost is able to outperform the state-of-the-art methods based on ensembles. We will also analyze its advantages using kappa-error diagrams, which we adapt to the imbalanced scenario.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 46, Issue 12, December 2013, Pages 3460–3471
نویسندگان
, , , ,