Article ID Journal Published Year Pages File Type
554378 IERI Procedia 2014 7 Pages PDF
Abstract

Accurate fault prediction is an indispensable step, to the extent of being a critical activity in software engineering. In fault prediction model development research, combination of metrics significantly improves the prediction capability of the model, but it also gives rise to the issue of handling an increased number of predictors and evolved nonlinearity due to complex interaction among metrics.Ordinary least square (OLS) based parametric regression techniques cannot effectively model such nonlinearity with a large number of predictors because the global parametric function to fit the data is not known beforehand. In our previous studies[1–3], we showed the impact of interaction in the combined metrics approach of fault prediction and statistically established the simultaneous increment in the predictive accuracy of the model with interaction.In this study we use K-Nearest Neighbor (KNN) regression as an example of nonparametric regression technique, otherwise well known for classification tasks in the data mining community. Through the results derived here, we empirically establish and validate the hypothesis that the performance of KNN regression remains ordinarily unaffected with increasing number of interacting predictors and simultaneously provides superior performance over widely used multiple linear regression (MLR).

Related Topics
Physical Sciences and Engineering Computer Science Information Systems