Article ID Journal Published Year Pages File Type
6853915 Data & Knowledge Engineering 2018 15 Pages PDF
Abstract
Getting information from large volumes of data is very expensive in terms of resources like CPU and memory, as well as computation time. The analysis of a small data set extracted from the original set is preferred. From this small set, called sample, approximate results can be obtained. The errors are acceptable given the reduced cost necessary for processing the data. Using sampling algorithms with small errors saves execution time and resources. This paper presents comparisons between sampling algorithms in order to determine which one performs better when taking into account set operations such as intersect, union and difference. The comparison focuses on the errors introduced by each algorithm for different sample sizes and on execution times.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,