Using differential evolution for improving distance measures of nominal values

Article ID	Journal	Published Year	Pages	File Type
6904055	Applied Soft Computing	2018	67 Pages	PDF

Abstract

Enhancing distance measures is the key to improve the performance of instance-based learning (IBL) and many machine learning (ML) algorithms. The value difference metrics (VDM) and inverted specific-class distance measure (ISCDM) are among the top performing distance measures that address nominal attribute. They use conditional probability terms to estimate the distance between nominal values; therefore, their accuracy mainly depends on the accurate estimation of these terms. An accurate estimation of conditional probability terms can be difficult if the training data is scarce. In this study, different metaheuristic approaches are used to find better estimations these terms for both VDM and ISCDM independently. We transform the conditional probability estimation problem into an optimization problem, and exploit three meta-heuristic approaches to solve it, namely, multi-parent differential evolution (MPDE), genetic algorithms (GA), and simulated annealing (SA). The goal of the objective function is to maximize the classification accuracy of the k-nearest neighbors (kNN) algorithm. We propose a new fine-tuning method which we name modified selective fine-tuning (MSFT) method, a new hybrid fine-tuning method (i.e., a combination of two fine-tuning methods), and three different ways for creating initial populations by manipulating the original estimated conditional probability terms used in VDM and ISCDM, and the fine-tuned conditional probability terms obtained from using other fine-tuning methods. We compare the performance of all approaches with the original distance measures using 53 general benchmark datasets. The experimental results show that the proposed methods significantly improve the classification and generalization accuracy of the VDM and ISCDM measures.

Keywords

Value difference metric Genetic algorithm Distance measure Simulated annealing Instance-Based Learning