Article ID Journal Published Year Pages File Type
572146 Accident Analysis & Prevention 2015 6 Pages PDF
Abstract

•Regression models can be easily overfitted to sample data.•Suggestions are presented for ways to decrease the influence of overfitting.•A data-set is progressively overfitted to show its reduction in generalizability.•A number of steps are suggested to create more robust models.

The prediction of on-road driving ability using off-road measures is a key aim in driving research. The primary goal in most classification models is to determine a small number of off-road variables that predict driving ability with high accuracy. Unfortunately, classification models are often over-fitted to the study sample, leading to inflation of predictive accuracy, poor generalization to the relevant population and, thus, poor validity. Many driving studies do not report sufficient details to determine the risk of model over-fitting and few report any validation technique, which is critical to test the generalizability of a model. After reviewing the literature, we generated a model using a moderately large sample size (n = 279) employing best practice techniques in the context of regression modelling. By then randomly selecting progressively smaller sample sizes we show that a low ratio of participants to independent variables can result in over-fitted models and spurious conclusions regarding model accuracy. We conclude that more stable models can be constructed by following a few guidelines.

Related Topics
Physical Sciences and Engineering Chemical Engineering Chemical Health and Safety
Authors
, , , ,