Transferring a petabyte in a day

Article ID	Journal	Published Year	Pages	File Type
6872841	Future Generation Computer Systems	2018	18 Pages	PDF

Abstract

Extreme-scale simulations and experiments can generate large amounts of data, whose volume can exceed the compute and/or storage capacity at the simulation or experimental facility. With the emergence of ultra-high-speed networks, researchers are considering pipelined approaches in which data are passed to a remote facility for analysis. Here we examine an extreme-scale cosmology simulation that, when run on a large fraction of a leadership computer, generates data at a rate of one petabyte per elapsed day. Writing those data to disk is inefficient and impractical, and in situ analysis poses its own difficulties. Thus we implement a pipeline in which data are generated on one supercomputer and then transferred, as they are generated, to a remote supercomputer for analysis. We use the Swift scripting language to instantiate this pipeline across Argonne National Laboratory and the National Center for Supercomputing Applications, which are connected by a 100 Gb/s network; and we demonstrate that by using the Globus transfer service we can achieve a sustained rate of 93 Gb/s over a 24-hour period, thus attaining our performance goal of one petabyte moved in 24Â h. This paper describes the methods used and summarizes the lessons learned in this demonstration.

Keywords

GridFTP Pipeline