Review InformaticsNext-generation sequencing: big data meets high performance computing

Article ID	Journal	Published Year	Pages	File Type
5521180	Drug Discovery Today	2017	6 Pages	PDF

Abstract

â¢Analysis of massive NGS datasets poses difficult computational challenges.â¢Big data algorithms are often adapted for NGS analysis.â¢HPC becomes pivotal as NGS transcends from research labs to medical applications.

The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and their efficient implementation on modern high performance computing systems is required.