Article ID Journal Published Year Pages File Type
452561 Computer Networks 2008 13 Pages PDF
Abstract

This work explores the use of statistical techniques, namely stratified sampling and cluster analysis, as powerful tools for deriving traffic properties at the flow level. Our results show that the adequate selection of samples leads to significant improvements allowing further important statistical analysis. Although stratified sampling is a well-known technique, the way we classify the data prior to sampling is innovative and deserves special attention. We evaluate two partitioning clustering methods, namely clustering large applications (CLARA) and K-means, and validate their outcomes by using them as thresholds for stratified sampling. We show that using flow sizes to divide the population we can obtain accurate estimates for both size and flow durations. The presented sampling and clustering classification techniques achieve data reduction levels higher than that of existing methods, on the order of 0.1% while maintaining good accuracy for the estimates of the sum, mean and variance for both flow duration and sizes.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , ,