An analysis of the factors that influence sugarcane yield in Northern Argentina using classification and regression trees

Article ID	Journal	Published Year	Pages	File Type
4511245	Field Crops Research	2009	9 Pages	PDF

Abstract

Multi-location trials are commonly used to estimate the effects of different explanatory factors on crop yield. Conversely, the analysis of production databases could also be useful for exploring and understanding such effects. These data require flexible and robust methods for dealing with multivariate, non-linear and unbalanced data structures, high-order interactions and missing values. In this paper, we explore the issue of crop yield explanation using a 5-year period (1999–2005) of sugarcane (Saccharum officinarum L.) yield data from Northern Argentina. Using a data mining technique such as classification and regression trees (CART) we show that farm membership (FARM) was among the main splitting factors for total cane per hectare (TCH) cluster variability. Crop class (AGE) was at the second level in the hierarchy and values of AGE higher than 2,5 splitted low and medium from the high TCH clusters. Sugarcane cultivar (VAR) was the most important explanatory factor regarding total sugar per hectare (TSH), and crop class (AGE) was second in importance. In this case, farm membership did not appear among the main splitting factors. The growth period duration, field area and precipitation did not show remarkable importance values for explaining final TCH and TSH values. By-year CART models also showed low values of importance of weather related variables across the years analyzed suggesting that other environmental conditions than precipitation is controlling yearly variation in sugar and cane yield (e.g. radiation, water-use efficiency or temperature regime). The CART analysis developed here is the first systematic analysis for explanatory factors of biomass and sugar content in Argentina's cane most productive region. However, we believe this methodology could be applicable for a wider geographic area and other sugarcane production regions as well as other cropping systems. Although regression trees provide less formal statistical inference, its results could be added as an additional analytical tool to traditional experimental analyses that use mixed models. Also, they could be useful for elaborating hypotheses and suggest mechanistic studies to test them.

Keywords

CART analysis Data mining Non-parametric method