Sampling networks by the union of m shortest path trees

Article ID	Journal	Published Year	Pages	File Type
451339	Computer Networks	2010	12 Pages	PDF

Abstract

Many network topology measurements capture or sample only a partial view of the actual network structure, which we call the underlying network. Sampling bias is a critical problem in the field of complex networks ranging from biological networks, social networks and artificial networks like the Internet. This bias phenomenon depends on both the sampling method of the measurements and the features of the underlying networks. In RIPE NCC and the PlanetLab measurement architectures, the Internet is mapped as G∪msptG∪mspt, the union of shortest paths between each pair of a set MM of m testboxes, or equivalently, m shortest path trees. In this paper, we investigate this sampling method on a wide class of real-world complex networks as well as on the weighted Erdös–Rényi random graphs. This general framework examines the effect of the set of testboxes on G∪msptG∪mspt. We establish the correlation between the subgraph GMGM of the underlying network, i.e. the set MM and the direct links between nodes of set MM, and the sampled network G∪msptG∪mspt. Furthermore, we illustrate that in order to obtain an increasingly accurate view of a given network, a higher than linear detection/measuring effort (the relative size m/Nm/N of set MM) is needed, where N is the size of the underlying network. Finally, when the relative size m/Nm/N of set MM is small, we characterize the kind of networks possessing small sampling bias, which provides insights on how to place the testboxes for good network topology measurement.

Keywords

Sampling bias Network sampling Shortest Path Tree