Article ID Journal Published Year Pages File Type
6856277 Information Sciences 2018 26 Pages PDF
Abstract
Software defect prediction (SDP) involves using machine learning to locate bugs in source code. Datasets used for SDP are typically affected by an issue called class imbalance. Traditional learning algorithms do not perform well on class imbalanced datasets. Cost-sensitive learning has been used in SDP to minimise the monetary costs incurred by predictions. We propose a framework which produces cost-sensitive predictions and also mitigates class imbalance. Since our algorithm builds a decision forest classifier, knowledge can be extracted by manual inspection of the individual decision trees. To enhance this knowledge discovery process, we propose an algorithm for extracting the most interesting patterns from a decision forest. Our algorithm calculates interestingness as the potential financial gain of knowing the pattern. We then present a process which combines the above-mentioned techniques into an end-to-end cost-sensitive knowledge discovery process. This process is demonstrated by extracting knowledge from four software projects undertaken by the National Aeronautics and Space Administration (NASA).
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,