کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
495460 862827 2014 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparative analysis of statistical and machine learning methods for predicting faulty modules
ترجمه فارسی عنوان
تجزیه و تحلیل مقایسهای آماری و روش یادگیری ماشین برای پیش بینی ماژولهای معیوب
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We investigate the performance of the fault proneness predictions using static code metrics.
• The performance of logistic regression and six machine-learning methods (ANN, SVM, DT, CCN, GMDH and GEP) was evaluated in the study for predicting the fault proneness in the modules.
• Thus, the aim of this work is to find the best predictor for predicting faulty modules.
• The validation of the methods is carried out using Receiver Operating Characteristic (ROC) analysis.
• In order to perform the analysis we validate the performance of these methods using public domain AR1 and AR6 data sets.

The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets AR1 and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for AR1 and AR6 data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 21, August 2014, Pages 286–297
نویسندگان
,