Dynamic selection of normalization techniques using data complexity measures

Article ID	Journal	Published Year	Pages	File Type
6854952	Expert Systems with Applications	2018	38 Pages	PDF

Abstract

Data preprocessing is an important step for designing classification model. Normalization is one of the preprocessing techniques used to handle the out-of-bounds attributes. This work develops 14 classification models using different learning algorithms for dynamic selection of normalization technique. This work extracts 12 data complexity measures for 48 datasets drawn from the KEEL dataset repository. Each of these datasets is normalized using min-max and z-score normalization technique. G-mean index is estimated for these normalized datasets using Gaussian Kernel Extreme Learning Machine (KELM) in order to determine the best-suited normalization technique. The data complexity measures along with the best-suited normalization technique are used as an input for developing the aforementioned dynamic models. These models predict the best suitable normalization technique based on the estimated data complexity measures of the dataset. The result shows that the model developed using Gaussian Kernel ELM (KELM) and Support Vector Machine (SVM) give promising results for most of the evaluated classification problems.

Keywords

Data preprocessing Data complexity