Incremental encoding for erasure-coded cross-datacenters cloud storage

Article ID	Journal	Published Year	Pages	File Type
6872940	Future Generation Computer Systems	2018	26 Pages	PDF

Abstract

As a storage efficient approach, erasure coding has been adopted by many large-scale cloud storage systems to protect data from server and datacenter failures. To erasure-coded storage systems, it is critical to encode newly written data blocks and generate parity blocks efficiently. Existing encoding approaches include Striping Encoding and Replicating Encoding. They either incur too high network traffic or seriously degrade the I/O performance. In this paper, we propose Incremental Encoding, a decentralized encoding framework for all linear erasure codes. To achieve the optimal write performance, Incremental Encoding forwards newly written data blocks to multiple servers in a pipelining manner. To reduce network traffic, Incremental Encoding combines newly written data blocks together incrementally at the same time when they flow through servers to generate parity blocks. Incremental Encoding also caches intermediate parity blocks into memory to further reduce disk I/O. We evaluate Incremental Encoding by theoretically analyzing the encoding overheads and conducting a series of experiments in both a single-datacenter environment and a cross-datacenters environment. Analysis and experiments show that Incremental Encoding can achieve a much better trade-off between network traffic and I/O performance. Specially, compared with Replicating Encoding, which has the optimal I/O performance, Incremental Encoding has nearly the same I/O performance with 44.5%-48.4% less encoding traffic. Compared with Striping Encoding, Incremental Encoding has up to 90% better write performance and up to 108% read performance with 56.25%-73.6% more encoding traffic.

Keywords

00-01 99-00 Erasure code