کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
425850 685931 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
ParSA: High-throughput scientific data analysis framework with distributed file system
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
ParSA: High-throughput scientific data analysis framework with distributed file system
چکیده انگلیسی


• A high-throughput scientific analysis framework with distributed file system.
• Group and split logical units, schedule the distribution of block replicas.
• Reduce network reading, and overlap reading, processing, and transferring.
• Provide logical operation interfaces, other than I/O interfaces.
• 1.3 GB/s on a 6 nodes (each 5 disks) Hadoop cluster, while 392 MB/s on a RAID-6 node.

Scientific data analysis and visualization have become the key component for nowadays large scale simulations. Due to the rapidly increasing data volume and awkward I/O pattern among high structured files, known serial methods/tools cannot scale well and usually lead to poor performance over traditional architectures. In this paper, we propose a new framework: ParSA (parallel scientific data analysis) for high-throughput and scalable scientific analysis, with distributed file system. ParSA presents the optimization strategies for grouping and splitting logical units to utilize distributed I/O property of distributed file system, scheduling the distribution of block replicas to reduce network reading, as well as to maximize overlapping the data reading, processing, and transferring during computation. Besides, ParSA provides the similar interfaces as the NetCDF Operator (NCO), which is used in most of climate data diagnostic packages, making it easy to use this framework. We utilize ParSA to accelerate well-known analysis methods for climate models on Hadoop Distributed File System (HDFS). Experimental results demonstrate the high efficiency and scalability of ParSA, getting the maximum 1.3 GB/s throughput on a six nodes Hadoop cluster with five disks per node. Yet, it can only get 392 MB/s throughput on a RAID-6 storage node.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 51, October 2015, Pages 111–119
نویسندگان
, , , , , , ,