A simple nonparametric index of bivariate association for environmental data exploration

Article ID	Journal	Published Year	Pages	File Type
4978263	Environmental Modelling & Software	2017	8 Pages	PDF

Abstract

A purely data-based index for detecting bivariate association is proposed for preliminary data exploration when seeking to model a dependent variable, associated with a possibly large number of independent variables. No particular form of association between the dependent and independent variables is assumed. The proposed bivariate association index is the value p, which is the probability that a scatter plot created by an X-randomization will generate a smaller mean nearest neighbour distance. The rationale is that randomizing an existing X-Y association will result in a scatter plot which will usually have a greater mean nearest neighbour distance. The process is then repeated for all other independent variables to give a specific p for each one. A subset of potentially informative independent variables is then obtained by noting all those with low p values, but just how small p should be is left to the user.

Keywords

Randomization Nonparametric Nearest neighbour