A metaheuristic optimization framework for informative gene selection

Article ID	Journal	Published Year	Pages	File Type
4960309	Informatics in Medicine Unlocked	2016	11 Pages	PDF

Abstract

â¢Due to the iterative nature of the method, it searches a randomly generated greater space and selects a gene subset that will is close to the global optimal solution.â¢The HS tunes the search by ranking the solutions with respect to their fitness. The improvised solutions generated by HS-GA-SVM are used for adding relevant genes.â¢Proposed algorithm does not put any restriction on finding predefined number of genes, because the gene subsets are selected on probability in proposed approach.â¢The most relevant genes present in almost 90% of runs were grouped to form informative genes selected by proposed algorithm since they are the most frequently selected solutions in the final subsets.â¢The performance of predicted/selected gene subsets (informative genes) from proposed hybridized HS-GA-SVM model has been evaluated for five datasets using six probabilistic measures such as; BCR, F-measure, JI, ARI, NMI, and Purity.

This paper presents a metaheuristic framework using Harmony Search (HS) with Genetic Algorithm (GA) for gene selection. The internal architecture of the proposed model broadly works in two phases, in the first phase, the model allows the hybridization of HS with GA to compute and evaluate the fitness of the randomly selected solutions of binary strings and then HS ranks the solutions in descending order of their fitness. In the second phase, the offsprings are generated using crossover and mutation operations of GA and finally, those offsprings were selected for the next generation whose fitness value is more than their parents evaluated by SVM classifier. The accuracy of the final gene subsets obtained from this model has been evaluated using SVM classifiers. The merit of this approach is analyzed by experimental results on five benchmark datasets and the results showed an impressive accuracy over existing feature selection approaches. The occurrence of gene subsets selected from this model have also been computed and the most often selected gene subsets with the probability of [0.1-0.9] have been chosen as optimal sets of informative genes. Finally, the performance of those selected informative gene subsets have been measured and established through probabilistic measures.

Keywords

Harmony Search algorithm Genetic algorithm Gene selection Metaheuristic SVM