Article ID Journal Published Year Pages File Type
6451289 Computational Biology and Chemistry 2016 7 Pages PDF
Abstract

•Reference-free assembly-based discovery of single nucleotide polymorphisms (SNP) from next generation sequencing data of bacterial genomes.•A bioinformatics pipeline that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads using a set of filters.•SnpFilt improves the precision of SNP calling in bacterial genomes.•For reliable and complete SNP calls, at least 40-fold coverage is required.

De novo assembly of bacterial genomes from next-generation sequencing (NGS) data allows a reference-free discovery of single nucleotide polymorphisms (SNP). However, substantial rates of errors in genomes assembled by this approach remain a major barrier for the reference-free analysis of genome variations in medically important bacteria. The aim of this report was to improve the quality of SNP identification in bacterial genomes without closely related references. We developed a bioinformatics pipeline (SnpFilt) that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads at neighbouring regions. The performance of the pipeline was compared against reference-based SNP calling for Illumina HiSeq, MiSeq and NextSeq reads from a range of bacterial pathogens including Salmonella, which is one of the most common causes of food-borne disease. The SnpFilt pipeline removed all false SNP in all test NGS datasets consisting of paired-end Illumina reads. We also showed that for reliable and complete SNP calls, at least 40-fold coverage is required. Analysis of bacterial isolates associated with epidemiologically confirmed outbreaks using the SnpFilt pipeline produced results consistent with previously published findings. The SnpFilt pipeline improves the quality of de-novo assembly and precision of SNP calling in bacterial genomes by removal of regions of the assembly that may potentially contain assembly errors. SnpFilt is available from https://github.com/LanLab/SnpFilt.

Graphical abstractDownload high-res image (116KB)Download full-size image

Related Topics
Physical Sciences and Engineering Chemical Engineering Bioengineering
Authors
, , , ,