کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
4968255 | 1449569 | 2017 | 17 صفحه PDF | دانلود رایگان |

- Develop a new two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system.
- Model and compare I/O throughput of two-level storage to HDFS and OrangeFS.
- Build a prototype of two-level storage system with Tachyon and OrangeFS,
- Conduct experiments on real systems show that the proposed two-level storage delivers higher aggregate I/O throughputs than HDFS and OrangeFS and achieves weak scalability on both read and write.
Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters.
Journal: Parallel Computing - Volume 61, January 2017, Pages 18-34