کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
536337 | 870500 | 2015 | 7 صفحه PDF | دانلود رایگان |

Gene expression data typically contain fewer samples (as each experiment is costly) and thousands of expression values (or features) captured by automatic robotic devices. Feature selection is one of the important and challenging tasks for this kind of data where many traditional methods failed and evolutionary based methods were succeeded. In this study, the initial datasets are preprocessed using a quartile based fast heuristic technique to reduce the crude domain features which are less relevant in categorizing the samples of either group. Hamming distance is introduced as a proximity measure to update the velocity of particle(s) in binary PSO framework to select the important feature subsets. The experimental results on three benchmark datasets vis-á-vis colon cancer, defused B-cell lymphoma and leukemia data are evaluated by means of classification accuracies and validity indices as well. Detailed comparative studies are also made to show the superiority and effectiveness of the proposed method. The present study clearly reveals that by choosing proper preprocessing method, fine tuned by HDBPSO with Hamming distance as a proximity measure, it is possible to find important feature subsets in gene expression data with better and competitive performances.
Journal: Pattern Recognition Letters - Volume 52, 15 January 2015, Pages 94–100