Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6855655 | Expert Systems with Applications | 2016 | 13 Pages |
Abstract
Credal Decision Trees (CDTs) are algorithms to design classifiers based on imprecise probabilities and uncertainty based information measures. These algorithms are suitable when noisy data sets are classified. This fact is explained in this paper in terms of the split criterion used in the new procedure of a special type of CDT, called Credal-C4.5. This criterion is more robust to noise than the split criteria of the classic decision trees. On the other hand, all the CDTs depend on a parameter s that has a direct relation with the size of the tree built. The known C4.5 and the Credal-C4.5 procedures are equivalent when the value s=0 is used. The relation between the value of this parameter s and the capacity of the Credal-C4.5 to classify data sets with label noise is studied. Also, an experimental research is carried out where Credal-C4.5 with different values for s are compared when they classify data sets with distinct label noise levels. From this experimentation we can conclude that, if it is possible to estimate the noise level of a data set, the choice of the correct value for s is a key point in order to obtain notably better results.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Carlos J. Mantas, JoaquÃn Abellán, Javier G. Castellano,