کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515858 867120 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Variance reduction in large graph sampling
ترجمه فارسی عنوان
کاهش واریانس در نمونه برداری بزرگ گراف
کلمات کلیدی
نمونه برداری تصادفی یکنواخت، پیاده روی تصادفی، نمونه برداری گراف شبکه اجتماعی آنلاین، شبکه مقیاس آزاد، معنی هارمونیک
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We derive an upper bound for the variance of RE sampling.
• We show that variance of RN sampling grows with data size for scale-free networks.
• Thereby RE sampling excels when data size is large.
• We support the analytical results with simulation studies and 18 real networks.

The norm of practice in estimating graph properties is to use uniform random node (RN) samples whenever possible. Many graphs are large and scale-free, inducing large degree variance and estimator variance. This paper shows that random edge (RE) sampling and the corresponding harmonic mean estimator for average degree can reduce the estimation variance significantly. First, we demonstrate that the degree variance, and consequently the variance of the RN estimator, can grow almost linearly with data size for typical scale-free graphs. Then we prove that the RE estimator has a variance bounded from above. Therefore, the variance ratio between RN and RE samplings can be very large for big data. The analytical result is supported by both simulation studies and 18 real networks. We observe that the variance reduction ratio can be more than a hundred for some real networks such as Twitter. Furthermore, we show that random walk (RW) sampling is always worse than RE sampling, and it can reduce the variance of RN method only when its performance is close to that of RE sampling.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 3, May 2014, Pages 476–491
نویسندگان
, ,