Article ID Journal Published Year Pages File Type
386249 Expert Systems with Applications 2014 13 Pages PDF
Abstract

•A new algorithm inspired by C4.5 and imprecise probabilities is defined.•This algorithm (Credal-C4.5) assumes unreliable data sets when the tree is built.•Pruning process is also incorporated to Credal-C4.5.•Several experiments are made to compare algorithms.•The new method is especially suitable to classify data sets with noise.

In the area of classification, C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. A modification of C4.5, called Credal-C4.5, is presented in this paper. This new procedure uses a mathematical theory based on imprecise probabilities, and uncertainty measures. In this way, Credal-C4.5 estimates the probabilities of the features and the class variable by using imprecise probabilities. Besides it uses a new split criterion, called Imprecise Information Gain Ratio, applying uncertainty measures on convex sets of probability distributions (credal sets). In this manner, Credal-C4.5 builds trees for solving classification problems assuming that the training set is not fully reliable. We carried out several experimental studies comparing this new procedure with other ones and we obtain the following principal conclusion: in domains of class noise, Credal-C4.5 obtains smaller trees and better performance than classic C4.5.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,