Article ID Journal Published Year Pages File Type
495316 Applied Soft Computing 2014 11 Pages PDF
Abstract

•Our hybrid intelligent model considers the use of filter- and wrapper-based feature selection methods.•Three qualitative principles are highlighted.•The usefulness of our model is demonstrated using relative cluster validities.•Better use a subset of salient features for analyzing clinical diagnoses in performing clustering.

Models based on data mining and machine learning techniques have been developed to detect the disease early or assist in clinical breast cancer diagnoses. Feature selection is commonly applied to improve the performance of models. There are numerous studies on feature selection in the literature, and most of the studies focus on feature selection in supervised learning. When class labels are absent, feature selection methods in unsupervised learning are required. However, there are few studies on these methods in the literature. Our paper aims to present a hybrid intelligence model that uses the cluster analysis techniques with feature selection for analyzing clinical breast cancer diagnoses. Our model provides an option of selecting a subset of salient features for performing clustering and comprehensively considers the use of most existing models that use all the features to perform clustering. In particular, we study the methods by selecting salient features to identify clusters using a comparison of coincident quantitative measurements. When applied to benchmark breast cancer datasets, experimental results indicate that our method outperforms several benchmark filter- and wrapper-based methods in selecting features used to discover natural clusters, maximizing the between-cluster scatter and minimizing the within-cluster scatter toward a satisfactory clustering quality.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
,