Article ID Journal Published Year Pages File Type
483750 Karbala International Journal of Modern Science 2015 12 Pages PDF
Abstract

Data mining literature offer some clustering techniques. But when we implement even an effective clustering technique, the results are found unreliable. The efficacy of the technique come under scrutiny. Here, the proposal is about an integrated framework, which ensures the reliability of the class labels assigned to a dataset whose class labels are unknown. The model uses PSO-k-means, k-medoids, c-means and Expectation Maximization for data clustering. This model integrates their results through majority voting cluster ensemble technique to enhance reliability. The reliable outcomes serve as the training set for the classification process through Bayesian classifier, Multi Layer Perceptron, Support Vector Machine and Decision tree. The predicted class labels by majority of classifiers through bagging classifier ensemble method are included with the training set and in combination, designated as the set with known class labels. Heterogeneous datasets with unknown class labels but known number of classes, after being treated through this model would be able to find the class labels for a significant portion of the data and may be accepted with reliability. The evaluation procedure has been performed by following the Dunn's, Davies–Bouldin and Modified Goodman–Kruskal indexing techniques for internal validation and probabilistic measures such as Normalized Mutual Information, Normalized Variation of Information and Adjusted Random Index which are appropriate measures of goodness-of-fit and robustness of the final clusters. The predictive capacity of the model is also validated through probabilistic measures and external indexing techniques such as Purity Measure, Random Index and F-measure.

Related Topics
Physical Sciences and Engineering Chemistry Chemistry (General)
Authors
, , ,