کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
386249 | 660881 | 2014 | 13 صفحه PDF | دانلود رایگان |

• A new algorithm inspired by C4.5 and imprecise probabilities is defined.
• This algorithm (Credal-C4.5) assumes unreliable data sets when the tree is built.
• Pruning process is also incorporated to Credal-C4.5.
• Several experiments are made to compare algorithms.
• The new method is especially suitable to classify data sets with noise.
In the area of classification, C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. A modification of C4.5, called Credal-C4.5, is presented in this paper. This new procedure uses a mathematical theory based on imprecise probabilities, and uncertainty measures. In this way, Credal-C4.5 estimates the probabilities of the features and the class variable by using imprecise probabilities. Besides it uses a new split criterion, called Imprecise Information Gain Ratio, applying uncertainty measures on convex sets of probability distributions (credal sets). In this manner, Credal-C4.5 builds trees for solving classification problems assuming that the training set is not fully reliable. We carried out several experimental studies comparing this new procedure with other ones and we obtain the following principal conclusion: in domains of class noise, Credal-C4.5 obtains smaller trees and better performance than classic C4.5.
Journal: Expert Systems with Applications - Volume 41, Issue 10, August 2014, Pages 4625–4637