کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
404876 677459 2015 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Coverage-based resampling: Building robust consolidated decision trees
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Coverage-based resampling: Building robust consolidated decision trees
چکیده انگلیسی


• Coverage-based resampling determines the number of samples to be used based on dataset’s class distribution.
• Consolidated trees achieve better results for most performance measures when using higher coverage values.
• CTC ranks first against multiple genetics-based and classical algorithms for rule induction.
• CTC combined with SMOTE tops state of the art techniques designed to tackle class imbalance.

The class imbalance problem has attracted a lot of attention from the data mining community recently, becoming a current trend in machine learning research. The Consolidated Tree Construction (CTC) algorithm was proposed as an algorithm to solve a classification problem involving a high degree of class imbalance without losing the explaining capacity, a desirable characteristic of single decision trees and rule sets. CTC works by resampling the training sample and building a tree from each subsample, in a similar manner to ensemble classifiers, but applying the ensemble process during the tree construction phase, resulting in a unique final tree. In the ECML/PKDD 2013 conference the term “Inner Ensembles” was coined to refer to such methodologies. In this paper we propose a resampling strategy for classification algorithms that use multiple subsamples. This strategy is based on the class distribution of the training sample to ensure a minimum representation of all classes when resampling. This strategy has been applied to CTC over different classification contexts. A robust classification algorithm should not just be able to rank in the top positions for certain classification problems but should be able to excel when faced with a broad range of problems. In this paper we establish the robustness of the CTC algorithm against a wide set of classification algorithms with explaining capacity.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 79, May 2015, Pages 51–67
نویسندگان
, , , , ,