| Article ID | Journal | Published Year | Pages | File Type |
|---|---|---|---|---|
| 10327961 | Computational Statistics & Data Analysis | 2005 | 23 Pages |
Abstract
In the framework of subset variable selection for regression, relevance measures based on the notion of mutual information are studied. Results on the estimation of this index of stochastic dependence in a continuous setting are first presented. They are grounded on kernel density estimation which makes the overall estimation of the mutual information quadratic. The behavior of the mutual information as a relevance measure is then empirically studied on several regression problems. The considered problems are artificially generated to contain irrelevant and redundant candidate explanatory variables as well as strongly nonlinear relationships. Next, still in a subset variable selection context, computationally more efficient approximations of the mutual information based on the notion of k-additive truncation are proposed. The 2- and 3-additive truncations appear to be of practical interest as relevance measures. The 2-additive truncation is based on the computation of the approximate relevance of a set of potential predictors from the relevance values of the singletons and pairs it contains. The 3-additive truncation additionally involves the relevance values of the 3-element subsets. The lower the amount of redundancy among the candidate explanatory variables, the better these approximations. The sample behavior of the two resulting relevance measures is finally empirically studied on the previously generated nonlinear artificial regression problems.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Ivan Kojadinovic,
