Article ID Journal Published Year Pages File Type
6452103 Journal of Biotechnology 2017 5 Pages PDF
Abstract

•Indel callers were tested on short-read data sets simulated from a bacterial genome.•Different tools provided deviant results with different quality.•Most tools failed to detect large proportions of long indels (>29 bp).•ScanIndel recovered all indels with exellent sensitivity and false discovery rate.

We tested the capabilities of four different software tools to detect insertions and deletions (indels) in a bacterial genome on the basis of short sequencing reads. We included tools applying the gapped-alignment (VarScan, FreeBayes) or split-read (Pindel) methods, respectively, and a combinatorial approach with local de-novo assembly (ScanIndel). Tests were performed with 151-basepair, paired-end sequencing reads simulated from a bacterial (Clostridioides difficile R20291) genome sequence with predefined indels (indel length, 1-2321 bp). Results achieved with the different tools varied widely, and the specific sensitivity and false-discovery rates strongly depended on indel size. All tools tested were able to detect short indels (≤29 basepairs) at sensitivities close to 100%, albeit Pindel reported up to 20% false calls. In contrast, gapped-alignment and split-read tools failed to recover large proportions of long indels (>29 bp) even at 120-fold coverage, and again, Pindel produced significant numbers of false-positive calls. Outstandingly, ScanIndel detected and reconstructed 97% of long indels on average (95% confidence intervals, 88%-99%) and, at the same time, produced negligible amounts of false calls. Hence, the combinatorial approach implemented in ScanIndel was able to recover the positions, types and sequences of indels with excellent sensitivity and false-discovery rate, by encompassing the full indel length spectrum present in the datasets.

Related Topics
Physical Sciences and Engineering Chemical Engineering Bioengineering
Authors
, ,