Community structure models are improved by exploiting taxonomic rank with predictive clustering trees

Article ID	Journal	Published Year	Pages	File Type
4375734	Ecological Modelling	2015	11 Pages	PDF

Abstract

•We build four types of community structure models for three different ecosystems.•We explore how taxonomic rank and multi-species data influence model performance.•Information about taxonomic rank improves the predictive performance of the models.•Global models are easier to interpret and overfit less than local models.•The best performing method is hierarchical multi-label classification.

Community structure modelling studies the influence of biotic and abiotic factors on the abundance and composition of a given taxonomic group of organisms. With the advancement of measurement and sensor technology, the availability, precision and complexity of environmental data constantly increases. Nowadays, measurements of ecosystems provide a complete snapshot of the state of the system, including information about the community structure of organisms that are present in a given sample. These measurements include multi-species data that are typically analysed by constructing community models as collections of models built for each species separately (local models) without considering the possible (taxonomic) relationships among species.In this work, we propose to construct a single community structure model for all the species (global model) that is able to exploit the aforementioned relationships. Namely, we investigate whether inclusion of additional information in the form of taxonomic rank or multiple species helps to build better community structure models. More specifically, we use predictive clustering trees (a generalized form of decision trees) to build models for three practically relevant datasets from the task of community structure modelling: microarthopod community living in the agricultural soils of Denmark, organisms living in Slovenian rivers and vegetation found in the State of Victoria, Australia.On each dataset, we compare the performance of four types of community structure models, which correspond to four machine learning tasks: Single species models without taxonomic rank correspond to single-label classification; single species models with taxonomic rank correspond to hierarchical single-label classification; multi-species models without taxonomic rank correspond to multi-label classification; and multi-species models with taxonomic rank correspond to hierarchical multi-label classification. The results of the experimental evaluation reveal that by using the taxonomic rank and the multi-species aspect of the data, we are able to learn better community structure models.

Keywords

Taxonomic rank Predictive clustering trees Classification Hierarchical multi-label classification