کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
432849 | 689089 | 2010 | 13 صفحه PDF | دانلود رایگان |
![عکس صفحه اول مقاله: Identifying frequent items in a network using gossip Identifying frequent items in a network using gossip](/preview/png/432849.png)
We present algorithms for identifying frequently occurring items in a large distributed data set. Our algorithms use gossip as the underlying communication mechanism, and do not rely on any central control, nor on an underlying network structure, such as a spanning tree. Instead, nodes repeatedly select a random partner and exchange data with that partner. If this process continues for a (short) period of time, the desired results are computed, with probabilistic guarantees on the accuracy. Our algorithm for identifying frequent items is built by layering a novel small space “sketch” of data over a gossip-based data dissemination mechanism. We prove that the algorithm identifies the frequent items with high probability, and provides bounds on the time till convergence. To our knowledge, this is the first work on identifying frequent items using gossip.
Research highlights
► The frequent items in a dataset distributed over a network can be identified by a gossip-based communication among the nodes. The nodes exchange and combine sketches (small-space summaries) of their datasets.
► For N nodes, all frequent items can be identified, with high probability, after O(NlogN) rounds of gossip.
► Experiments with different data distributions suggest better convergence time and smaller sketch size than what theory demands.
Journal: Journal of Parallel and Distributed Computing - Volume 70, Issue 12, December 2010, Pages 1241–1253