Simplifying artificial neural network models of river basin behaviour by an automated procedure for input variable selection

Article ID	Journal	Published Year	Pages	File Type
380346	Engineering Applications of Artificial Intelligence	2015	15 Pages	PDF

Abstract

•An excessive number of input variables can reduce the efficiency of ANN simulation.•A criterion was established by which input variables could be selected for exclusion.•The proposed method satisfactorily identifies the most important variables.•Simplification of the ANN improved performance of hydrological simulations.

The objective of the present work is to present a simplified and automated method for identifying and excluding unnecessary input variables, with a consequent reduction in dimensionality of ANN-based hydrological models. The proposed method is iterative and computationally efficient: it consists of perturbing the input variables, recording the change in model performance, establishing an index showing the contribution of each variable to the ANN (the relative contribution index, RCI) and excluding the least-influential variables that fall below a threshold. The method was used to simulate mean daily flow for a 20-year period 1989–2009 from four drainage basins nested at different scales ranging from 19.4 km² to 9426 km², in the Southern Brazil. The main result of this method of simplifying ANN-based hydrological models was to increase the Nash–Sutcliffe (NS) coefficient and to reduce RMSE in all the simulations undertaken. The potential of ANN models was therefore improved by eliminating unnecessary and/or redundant variables. Simulating the intermediate basin with area 5414 km² (Santo Ângelo), for example, the initial performance (12 inputs; NS=0.894) improved when a simpler and more parsimonious model was used (4 inputs; NS=0.944). To validate the simplification procedure, a comparison was made between the proposed method (RCI) and the well-known methods of Overall Connection Weights (OCW) and Forward Stepwise Addition (FSA). For the comparison between RCI and OCW methods, in most cases, the ordering of selected variables was similar, confirming that the two procedures satisfactorily identify the more important variables, although the RCI is computationally more efficient giving a small advantage in the resulting model performance. In the FSA method, although the performance of the obtained models has also been satisfactory, the computational effort was much greater than with the other two methods because of the excessive number of the neural network training performed (117 training procedures in Combination 2, against only six for the RCI method, for example).

Keywords

Hydrological simulation