Article ID Journal Published Year Pages File Type
6409220 Journal of Hydrology 2016 17 Pages PDF
Abstract

•A novel algorithm to identify alternate subsets of hydro-meteorological predictors.•Relevance and redundancy of predictors are measured via information theoretic criteria.•The algorithm is tested on synthetic datasets with known dependence relations.•A streamflow prediction application gives insights into underlying physical processes.

This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors-and associated models-helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.

Related Topics
Physical Sciences and Engineering Earth and Planetary Sciences Earth-Surface Processes
Authors
, , , ,