Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
427748 | Information Processing Letters | 2012 | 6 Pages |
We propose a technique for reducing communication overheads when sending data across a network. Our technique, called hash challenges, leverages existing deduplication solutions based on compare-by-hash by being able to determine redundant data chunks by exchanging substantially less meta-data. Hash challenges can be used directly on any existing compare-by-hash protocol, with no relevant additional computational complexity. Using real data from reference workloads, we show that hash challenges can save as much as 64%64% meta-data exchanged across the network, relatively to plain compare-by-hash. This implies reductions of up to 7%7% in overall transferred volume, and performance gains of up to 16%16% with typical asymmetrical broadband connections.
► We propose a novel distributed deduplication technique, called hash challenges. ► Substantial savings in meta-data overhead relatively to compare-by-hash. ► Formal analysis confirms advantages in network efficiency.