کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
403469 | 677241 | 2015 | 13 صفحه PDF | دانلود رایگان |
Use of software product metrics in defect prediction studies highlights the utility of these metrics. Public availability of software defect data based on the product metrics has resulted in the development of defect prediction models. These models experience a limitation in learning Defect-prone (D) modules because the available datasets are imbalanced. Most of the datasets are dominated by Not Defect-prone (ND) modules as compared to D modules. This affects the ability of classification models to learn the D modules more accurately. This paper presents an association mining based approach that allows the defect prediction models to learn D modules in imbalanced datasets. The proposed algorithm preprocesses data by setting specific metric values as missing and improves the prediction of D modules. The proposed algorithm has been evaluated using 5 public datasets. A Naive Bayes (NB) classifier has been developed before and after the proposed preprocessing. It has been shown that Recall of the classifier after the proposed preprocessing has improved. Stability of the approach has been tested by experimenting the algorithm with different number of bins. The results show that the algorithm has resulted in up to 40% performance gain.
Journal: Knowledge-Based Systems - Volume 90, December 2015, Pages 1–13