Article ID Journal Published Year Pages File Type
4374768 Ecological Informatics 2016 7 Pages PDF
Abstract

•User-level data cleaning is seldom applied to biodiversity databases.•We present a new framework to quantify the effect of data cleaning on SDMs.•Data cleaning resulted in significant improvement in SDMs across all studied scales.•The largest SDM improvement following data cleaning was for small mammals (1 g–100 g).•We exemplify the value of case-specific, user-level data cleaning.

The recent availability of species occurrence data from numerous sources, standardized and connected within a single portal, has the potential to answer fundamental ecological questions. These aggregated big biodiversity databases are prone to numerous data errors and biases. The data-user is responsible for identifying these errors and assessing if the data are suitable for a given purpose. Complex technical skills are increasingly required for handling and cleaning biodiversity data, while biodiversity scientists possessing these skills are rare. Here, we estimate the effect of user-level data cleaning on species distribution model (SDM) performance. We implement several simple and easy-to-execute data cleaning procedures, and evaluate the change in SDM performance. Additionally, we examine if a certain group of species is more sensitive to the use of erroneous or unsuitable data. The cleaning procedures used in this research improved SDM performance significantly, across all scales and for all performance measures. The largest improvement in distribution models following data cleaning was for small mammals (1 g–100 g). Data cleaning at the user level is crucial when using aggregated occurrence data, and facilitating its implementation is a key factor in order to advance data-intensive biodiversity studies. Adopting a more comprehensive approach for incorporating data cleaning as part of data analysis, will not only improve the quality of biodiversity data, but will also impose a more appropriate usage of such data.

Related Topics
Life Sciences Agricultural and Biological Sciences Ecology, Evolution, Behavior and Systematics
Authors
, ,