A genetic algorithm-based rule extraction system

Article ID	Journal	Published Year	Pages	File Type
496171	Applied Soft Computing	2012	17 Pages	PDF

Abstract

Individual classifiers predict unknown objects. Although, these are usually domain specific, and lack the property of scaling up prediction while handling data sets with huge size and high-dimensionality or imbalance class distribution. This article introduces an accuracy-based learning system called DTGA (decision tree and genetic algorithm) that aims to improve prediction accuracy over any classification problem irrespective to domain, size, dimensionality and class distribution. More specifically, the proposed system consists of two rule inducing phases. In the first phase, a base classifier, C4.5 (a decision tree based rule inducer) is used to produce rules from training data set, whereas GA (genetic algorithm) in the next phase refines them with the aim to provide more accurate and high-performance rules for prediction. The system has been compared with competent non-GA based systems: neural network, Naïve Bayes, rule-based classifier using rough set theory and C4.5 (i.e., the base classifier of DTGA), on a number of benchmark datasets collected from UCI (University of California at Irvine) machine learning repository. Empirical results demonstrate that the proposed hybrid approach provides marked improvement in a number of cases.

► We model a hybrid evolutionary classification system, combining C4.5 and GA. ► This study seeks to improve prediction accuracy over classification problems irrespective to domain, size, dimensionality and class distribution. ► Another dimension that we consider here is learning time. ► Experimental results demonstrate the strength of the system.

Keywords

C4.5 Genetic algorithm Accuracy hybrid system Classification