Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4978263 | Environmental Modelling & Software | 2017 | 8 Pages |
Abstract
A purely data-based index for detecting bivariate association is proposed for preliminary data exploration when seeking to model a dependent variable, associated with a possibly large number of independent variables. No particular form of association between the dependent and independent variables is assumed. The proposed bivariate association index is the value p, which is the probability that a scatter plot created by an X-randomization will generate a smaller mean nearest neighbour distance. The rationale is that randomizing an existing X-Y association will result in a scatter plot which will usually have a greater mean nearest neighbour distance. The process is then repeated for all other independent variables to give a specific p for each one. A subset of potentially informative independent variables is then obtained by noting all those with low p values, but just how small p should be is left to the user.
Related Topics
Physical Sciences and Engineering
Computer Science
Software
Authors
Varvara Vetrova, Earl Bardsley,