Article ID Journal Published Year Pages File Type
4575044 Geoderma 2008 9 Pages PDF
Abstract

Digital soil mapping is currently experiencing a tremendous increase in available environmental covariates and resolution for spatial soil predictions, resulting in computational problems in terms of limited data handling capabilities of machine learning approaches. This is of particular importance when gridded spatial soil class maps are used as a basis for predictions containing large amounts of redundant instances and noisy information.In this study we systematically analyze the effect of instance selection, which aims at reducing sample size, while preserving or even increasing prediction accuracy. On a soil class dataset with 95,000 instances we tested two sampling approaches in relation to parameter settings of decision tree based learning: proportional and disproportional stratified random sampling. An automated grid search approach was used to find the best performing parameter settings of the decision tree.The results show that an appropriate sampling method in combination with a grid search method returns better results than those obtained when grid learning is applied without instance selection. Instance selection increases prediction accuracy especially if the frequency distribution of the soil classes is low compared to the surrounding area. However, instance selection does not help in pedological interpretation. Nevertheless, it is a valuable pre-processing method to handle large spatial high resolution datasets in digital soil class prediction in terms of accuracy and computational costs.As suggested on the basis of the results of this study, spatially constrained instance selection as well as boundary based digital soil mapping in terms of soil taxonomic contrast should be investigated in future pedometric research.

Related Topics
Physical Sciences and Engineering Earth and Planetary Sciences Earth-Surface Processes
Authors
, , ,