کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6458899 1421120 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The effect of tuning, feature engineering, and feature selection in data mining applied to rainfed sugarcane yield modelling
ترجمه فارسی عنوان
اثر تنظیم، مهندسی ویژگی و انتخاب ویژگی در داده کاوی به مدل سازی عملکرد نیشکر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


- Data-mining techniques were applied to data from sugarcane production.
- The impact of different approaches to include weather data was evaluated.
- The RReliefF algorithm is used to evaluate feature engineering.
- We evaluated the impact of tuning, feature selection, and feature engineering in error.
- Sixty-six combinations were evaluated to quantify the impacts on model performance.

Crop yield models can assist decision makers within any agro-industrial supply chain, even with regard to decisions that are unrelated to the crop production. Considering the characteristics of the mechanisms and data related to yield, data mining techniques are suitable candidates for modelling. The use of these techniques within a context with feature engineering, feature selection, and proper tuning can further improve performance beyond a simple replacement of multiple linear regression. To evaluate the impact of the different steps in the mentioned context, we evaluated sugarcane (Saccharum spp.) yield modelling with data obtained from a sugarcane mill. For a combination of six techniques, tuning, feature selection, and feature engineering, leading to 66 combinations, we assessed final model performance. Average performance across combinations resulted in a mean absolute error (MAE) of 6.42 Mg ha−1. Using different techniques led to a range of MAE from 4.57 to 8.80 Mg ha−1 on average. The best and worst performances for an individual model were MAEs of 4.11 and 9.00 Mg ha−1. Models with lower performance were close to simply predicting yield from the average yield for each number of cuts (MAE of 9.86 Mg ha−1). Tuning and feature engineering reduced the MAE on average by 1.17 and 0.64 Mg ha−1, respectively. Feature selection removed nearly 40% of the features but increased the MAE by 0.19 Mg ha−1. The performance of models was improved by simple strategies such as decomposing weather attributes and detailing fertilisation. Evaluation of feature importance provided by the RReliefF feature selection algorithm was used to explain the performance gains. If empirical models are needed, they will rely on using advanced techniques, but they will need proper algorithm tuning and feature engineering to extract most of the information from datasets. Based on the results, we recommend following the presented workflow for the development of yield models.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers and Electronics in Agriculture - Volume 128, October 2016, Pages 67-76
نویسندگان
, ,