Measurement error in network data: A re-classification

Article ID	Journal	Published Year	Pages	File Type
1129319	Social Networks	2012	14 Pages	PDF

Abstract

Research on measurement error in network data has typically focused on missing data. We embed missing data, which we term false negative nodes and edges, in a broader classification of error scenarios. This includes false positive nodes and edges and falsely aggregated and disaggregated nodes. We simulate these six measurement errors using an online social network and a publication citation network, reporting their effects on four node-level measures – degree centrality, clustering coefficient, network constraint, and eigenvector centrality. Our results suggest that in networks with more positively-skewed degree distributions and higher average clustering, these measures tend to be less resistant to most forms of measurement error. In addition, we argue that the sensitivity of a given measure to an error scenario depends on the idiosyncracies of the measure's calculation, thus revising the general claim from past research that the more ‘global’ a measure, the less resistant it is to measurement error. Finally, we anchor our discussion to commonly-used networks in past research that suffer from these different forms of measurement error and make recommendations for correction strategies.

► We simulate measurement error on two empirical networks, an online friendship graph and a citation graph. ► Networks with higher average clustering and more positively skewed degree distributions are less robust to measurement error. ► Clustering coefficient and network constraint are less robust to error than centrality measures. ► Missing nodes and edges are not consistently more harmful than spurious nodes and edges. ► Error correction strategies include focusing data cleaning on active node subsets and conditional imputation methods.

Keywords

Measurement error Missing data Simulation