کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
495252 | 862821 | 2015 | 8 صفحه PDF | دانلود رایگان |
• Proposed an improved approaches for attribute clustering based on the GGA.
• It can speed up classification time and reduce cost by selecting a feature subset.
• It can replace missing values by other attributes in the same clusters.
• Experiments show that the proposed approach is efficient on a real dataset.
Feature selection is a pre-processing step in data mining and machine learning, and is very important in analyzing high-dimensional data. Attribute clustering has been proposed for feature selection. If similar attributes can be clustered into groups, they can then be easily replaced by others in the same group when some attribute values are missing. Hong et al. proposed a genetic algorithm (GA) to find appropriate attribute clusters. However, in their approaches, multiple chromosomes represent the same attribute clustering result (feasible solution) due to the combinatorial property, and thus the search space is larger than necessary. This study improves the performance of the GA-based attribute clustering process based on the grouping genetic algorithm (GGA). In the proposed approach, the general GGA representation and operators are used to reduce redundancy in the chromosome representation for attribute clustering. Experiments are also conducted to compare the efficiency of the proposed approach with that of an existing approach. The results indicate that the proposed approach can derive attribute grouping results in an effective way.
The results of using the GGA-based attribute clustering algorithm are compared with those from using GA-based attribute clustering, based on the SPECT dataset. In the experiment, the initial population size P was set at 20, the mutation rate Pm was set at 0.05, 18 features were set as the input features set, and 4 irrelevant features were not considered. The objective is to select four features for classification. The experiments were run 10 times for each algorithm. Figure shows the comparison between our approach and the GA-based one. The fitness value trend shows that the GGA-based approach performs better than the GA-based one.Figure optionsDownload as PowerPoint slide
Journal: Applied Soft Computing - Volume 29, April 2015, Pages 371–378