Evaluating different families of prediction methods for estimating software project outcomes

Article ID	Journal	Published Year	Pages	File Type
458337	Journal of Systems and Software	2016	17 Pages	PDF

Abstract

•We compare classifiers using AUC when predicting software project outcome.•Attribute selection using Information Gain improves our classifiers performance.•Statistical and ensemble classifiers are robust for predicting project outcome.•Random Forest is the most appropriate technique for determining project outcome.•Best prediction is achieved with team dynamics, process, and estimation attributes.

Software has been developed since the 1960s but the success rate of development projects is still low. Classification models have been used to predict defects and effort estimation, but little work has been done to predict the outcome of these projects. Previous research shows that it is possible to predict outcome using classifiers based on key variables during development, but it is not clear which techniques provide more accurate predictions. We benchmark classifiers from different families to determine the outcome of a software project and identify variables that influence it. A survey-based empirical investigation was used to examine variables contributing to project outcome. Classification models were built and tested to identify the best classifiers for this data by comparing their AUC values. We reduce the dimensionality of the data with Information Gain and build models with the same techniques. We use Information Gain and classification techniques to identify key attributes and their relative importance. We find that four classification techniques provide good results for survey data, regardless of dimensionality reduction. We conclude that Random Forest is the most appropriate technique for predicting project outcome. We identified key attributes which are related to communication, estimation, and process review.

Keywords

Classification techniques