Improving big data storage performance in hybrid environment

Article ID	Journal	Published Year	Pages	File Type
6874367	Journal of Computational Science	2018	18 Pages	PDF

Abstract

Hybrid storage satisfies the requirements of real-time processing and large capacity put forward by big data. But the widely used big data storage platform HDFS still cannot efficiently utilize emerging devices in hybrid environment. And most of existing hybrid storage researches also fail to consider the asymmetric characteristics among devices and data. This paper proposes a preference model to quantitatively weight the storage performance imbalance when data are distributed on different devices, and then distributes data on storage device whose performance efficiently matches data access characteristics. The implemented Preference-Aware HDFS (PAHDFS) shows high performance, efficiency and scalability in experiments.

Keywords

HDFs Big Data