Article ID Journal Published Year Pages File Type
569569 Environmental Modelling & Software 2015 19 Pages PDF
Abstract

•We address the important problem of the performance of the PMI IVS influenced by boundary and bandwidth issues.•We develop approaches to improve the performance of the PMI IVS for non-Gaussian and non-linear problems.•Boundary resistant methods exhibit greater success than methods focussed on boundary correction.•The performance (selection accuracy) of PMI IVS is improved when accounting for boundary issues.•Preliminary guidelines of bandwidth selection are developed for PMI IVS and successfully validated on two semi-real studies.

Input variable selection (IVS) is vital in the development of data-driven models. Among different IVS methods, partial mutual information (PMI) has shown significant promise, although its performance has been found to deteriorate for non-Gaussian and non-linear data. In this paper, the effectiveness of different approaches to improving PMI performance is investigated, focussing on boundary issues associated with bandwidth estimation. Boundary issues, associated with kernel-based density and residual computations within PMI, arise from the extension of symmetrical kernels beyond the feasible bounds of potential inputs, and result in an underestimation of kernel-based marginal and joint probability distribution functions in the PMI. In total, the effectiveness of 16 different approaches is tested on synthetically generated data and the results are used to develop preliminary guidelines for PMI IVS. By using the proposed guidelines, the correct inputs can be identified in 100% of trials, even if the data are highly non-linear or non-Gaussian.

Related Topics
Physical Sciences and Engineering Computer Science Software
Authors
, , ,