Article ID Journal Published Year Pages File Type
5741885 Ecological Informatics 2017 9 Pages PDF
Abstract

•We present six non-homogeneous consensus models to predict the presence-absence of marine phytoplankton species.•In most of the cases, the consensus models behaved better than the single-models that were used to construct them.•The single-models considered were generalized linear models, random forests, boosting and support vector machines.•Our results suggest that attention must be given to consensus methods when dealing with ecological prediction.

We performed different consensus methods by combining binary classifiers, mostly machine learning classifiers, with the aim to test their capability as predictive tools for the presence-absence of marine phytoplankton species. The consensus methods were constructed by considering a combination of four methods (i.e., generalized linear models, random forests, boosting and support vector machines). Six different consensus methods were analyzed by taking into account six different ways of combining single-model predictions. Some of these methods are presented here for the first time. To evaluate the performance of the models, we considered eight phytoplankton species presence-absence data sets and data related to environmental variables. Some of the analyzed species are toxic, whereas others provoke water discoloration, which can cause alarm in the population. Besides the phytoplankton data sets, we tested the models on 10 well-known open access data sets. We evaluated the models' performances over a test sample. For most (72%) of the data sets, a consensus method was the method with the lowest classification error. In particular, a consensus method that weighted single-model predictions in accordance with single-model performances (weighted average prediction error - WA-PE model) was the one that presented the lowest classification error most of the time. For the phytoplankton species, the errors of the WA-PE model were between 10% for the species Akashiwo sanguinea and 38% for Dinophysis acuminata. This study provides novel approaches to improve the prediction accuracy in species distribution studies and, in particular, in those concerning marine phytoplankton species.

Related Topics
Life Sciences Agricultural and Biological Sciences Ecology, Evolution, Behavior and Systematics
Authors
, , ,