کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6409858 1332874 2015 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA
ترجمه فارسی عنوان
یک چارچوب یادگیری آماری برای مدل های نیترات آب زیرزمینی دره مرکزی، کالیفرنیا، ایالات متحده آمریکا
موضوعات مرتبط
مهندسی و علوم پایه علوم زمین و سیارات فرآیندهای سطح زمین
چکیده انگلیسی


- We compared groundwater nitrate models for the Central Valley, California.
- The machine learning models were optimized within a statistical learning framework.
- All three models fitted complex patterns in the training data (R2 = 0.94 − 1.0).
- Boosted regression trees had the highest testing R2 (0.39) and the least bias.

SummaryWe used a statistical learning framework to evaluate the ability of three machine-learning methods to predict nitrate concentration in shallow groundwater of the Central Valley, California: boosted regression trees (BRT), artificial neural networks (ANN), and Bayesian networks (BN). Machine learning methods can learn complex patterns in the data but because of overfitting may not generalize well to new data. The statistical learning framework involves cross-validation (CV) training and testing data and a separate hold-out data set for model evaluation, with the goal of optimizing predictive performance by controlling for model overfit. The order of prediction performance according to both CV testing R2 and that for the hold-out data set was BRT > BN > ANN. For each method we identified two models based on CV testing results: that with maximum testing R2 and a version with R2 within one standard error of the maximum (the 1SE model). The former yielded CV training R2 values of 0.94-1.0. Cross-validation testing R2 values indicate predictive performance, and these were 0.22-0.39 for the maximum R2 models and 0.19-0.36 for the 1SE models. Evaluation with hold-out data suggested that the 1SE BRT and ANN models predicted better for an independent data set compared with the maximum R2 versions, which is relevant to extrapolation by mapping. Scatterplots of predicted vs. observed hold-out data obtained for final models helped identify prediction bias, which was fairly pronounced for ANN and BN. Lastly, the models were compared with multiple linear regression (MLR) and a previous random forest regression (RFR) model. Whereas BRT results were comparable to RFR, MLR had low hold-out R2 (0.07) and explained less than half the variation in the training data. Spatial patterns of predictions by the final, 1SE BRT model agreed reasonably well with previously observed patterns of nitrate occurrence in groundwater of the Central Valley.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Hydrology - Volume 531, Part 3, December 2015, Pages 902-911
نویسندگان
, , ,