Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
424858 | Future Generation Computer Systems | 2016 | 16 Pages |
•A networked storage architecture called DiVers is proposed to store versioned data.•Sparsity exploiting erasure coding is used to reduce storage overhead in DiVers.•Reliability aspect for DiVers is addressed to achieve best fault tolerance.•System level issues such as metadata management and network protocol are discussed.
We propose a differential versioning based data storage (DiVers) architecture for distributed storage systems, which relies on a novel erasure coding technique that exploits sparsity across versions. The emphasis of this work is to demonstrate how sparsity exploiting codes (SEC), originally designed for I/O optimization, can be extended to significantly reduce storage overhead in a repository of versioned data. In addition to facilitating reduced storage, we address some key reliability aspects for DiVers such as (i) mechanisms to deploy the coding technique with arbitrarily varying size of data across versions, and (ii) investigating the right allocation strategy for the encoded blocks over a network of distributed nodes across different versions so as to achieve the best fault tolerance. We also discuss system issues related to the management of data structures for accessing and manipulating the files over the differential versions.