Article ID Journal Published Year Pages File Type
10127868 Computers & Industrial Engineering 2018 14 Pages PDF
Abstract
Analyzing imbalanced dataset is a critical and challenging task in data mining, since it requires special treatment for clusters with different sizes. Imbalance dataset commonly exists in some domains like medical problems. This study intends to propose a classification algorithm based on information granulation (IG) concept for handling imbalanced dataset. The proposed algorithm assembles data from majority classes into granules to balance the class ratio within the data. The proposed algorithm works in two stages. First stage generates a set of IGs using metaheuristics approaches which is a kind of automatic clustering algorithm including dynamic clustering using particle swarm optimization (DCPSO), genetic algorithm K-means (GA K-means), and artificial bee colony K-means (ABC K-means). The next stage applies classification algorithm to classify the data. In this study, the proposed algorithm is verified using both balance and imbalanced benchmark datasets. Simulation results show that the proposed algorithms have promising classification results. Furthermore, this study also applies the proposed algorithms to prostate cancer prognosis classification problem. The algorithm is employed to predict survival rate of prostate cancer patients based on some medical data. The result shows that the proposed algorithms have lower error rate.
Related Topics
Physical Sciences and Engineering Engineering Industrial and Manufacturing Engineering
Authors
, , , ,