Article ID Journal Published Year Pages File Type
418080 Computational Statistics & Data Analysis 2007 13 Pages PDF
Abstract

Simple random sampling is a widely accepted basis for estimation from a population. When data come as a stream, the total population size continuously grows and only one pass through the data is possible. Reservoir sampling is a method of maintaining a fixed size random sample from streaming data. Reservoir sampling without replacement has been extensively studied and several algorithms with sub-linear time complexity exist. Although reservoir sampling with replacement is previously mentioned by some authors, it has been studied very little and only linear algorithms exist. A with-replacement reservoir sampling algorithm of sub-linear time complexity is introduced. A thorough complexity analysis of several approaches to the with-replacement reservoir sampling problem is also provided.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,