Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
383701 | Expert Systems with Applications | 2013 | 12 Pages |
The study of data complexity metrics is an emergent area in the field of data mining and is focused on the analysis of several data set characteristics to extract knowledge from them. This information can be used to support the election of the proper classification algorithm.This paper addresses the analysis of the relationship between data complexity measures and classifiers behavior. Each one of the metrics is evaluated covering its range of values and studying the classifiers accuracy on these values.The results offer information about the usefullness of these measures, and which of them allow us to analyze the nature of the input data set and help us to decide which classification method could be the most promising one.
► Data complexity metrics cannot be used to establish relationship among data sets. ► Metric evaluated has sense in a data set by itself (being high or low). ► Measures with clear relation with classifier performances are: F1, F2, F3, L1 and N1. ► The metrics which do not offer interesting information are: N2, N3, L2, L3, N4 and T1.