کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4956476 1444519 2017 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Resemblance and mergence based indexing for high performance data deduplication
ترجمه فارسی عنوان
نمایه سازی مشابه و همپوشانی برای داده های با کارایی بالا
کلمات کلیدی
شاخص سریع، تقلید کردن، شباهت مشابهی، بازیابی اثر انگشت، شاخص ارزش کلیدی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
چکیده انگلیسی
Data deduplication, a data redundancy elimination technique, has been widely employed in many application environments to reduce data storage space. However, it is challenging to provide a fast and scalable key-value fingerprint index particularly for large datasets, while the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and a data resemblance algorithm to dramatically reduce the query range. At data ingesting time, RMD uses a resemblance algorithm to detect resemble data segments and put resemblance segments in the same bin. As a result, at querying time, it only needs to search in the corresponding bin to detect duplicate content, which significantly speeds up the query process. Moreover, RMD uses a mergence strategy to accumulate resemblance segments to relevant bins, and exploits frequency-based fingerprint retention policy to cap the bin capacity to improve query throughput and data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve high query performance and outperforms several well-known deduplication schemes.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 128, June 2017, Pages 11-24
نویسندگان
, , , , ,