Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6873417 | Future Generation Computer Systems | 2018 | 9 Pages |
Abstract
The large amount of time spent transferring experimental data in fields such as genomics is hampering the ability of scientists to generate new knowledge. Often, computer hardware is capable of faster transfers but sub-optimal transfer software and configurations are limiting performance. This work seeks to serve as a guide to identifying the optimal configuration for performing genomics data transfers. A wide variety of tests narrow in on the optimal data transfer parameters for parallel data streaming across Internet2 and between two CloudLab clusters loading real genomics data onto a parallel file system. The best throughput was found to occur with a configuration using GridFTP with at least 5 parallel TCP streams with a 16Â MiB TCP socket buffer size to transfer to/from 4-8 BeeGFS parallel file system nodes connected by InfiniBand.
Related Topics
Physical Sciences and Engineering
Computer Science
Computational Theory and Mathematics
Authors
Nicholas Mills, F. Alex Feltus, Walter B. Ligon III,