کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6452103 1416994 2017 5 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The challenge of detecting indels in bacterial genomes from short-read sequencing data
ترجمه فارسی عنوان
چالش تشخیص هندل در ژنوم باکتریایی از داده های ترتیبی کوتاه شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی شیمی بیو مهندسی (مهندسی زیستی)
چکیده انگلیسی


- Indel callers were tested on short-read data sets simulated from a bacterial genome.
- Different tools provided deviant results with different quality.
- Most tools failed to detect large proportions of long indels (>29 bp).
- ScanIndel recovered all indels with exellent sensitivity and false discovery rate.

We tested the capabilities of four different software tools to detect insertions and deletions (indels) in a bacterial genome on the basis of short sequencing reads. We included tools applying the gapped-alignment (VarScan, FreeBayes) or split-read (Pindel) methods, respectively, and a combinatorial approach with local de-novo assembly (ScanIndel). Tests were performed with 151-basepair, paired-end sequencing reads simulated from a bacterial (Clostridioides difficile R20291) genome sequence with predefined indels (indel length, 1-2321 bp). Results achieved with the different tools varied widely, and the specific sensitivity and false-discovery rates strongly depended on indel size. All tools tested were able to detect short indels (≤29 basepairs) at sensitivities close to 100%, albeit Pindel reported up to 20% false calls. In contrast, gapped-alignment and split-read tools failed to recover large proportions of long indels (>29 bp) even at 120-fold coverage, and again, Pindel produced significant numbers of false-positive calls. Outstandingly, ScanIndel detected and reconstructed 97% of long indels on average (95% confidence intervals, 88%-99%) and, at the same time, produced negligible amounts of false calls. Hence, the combinatorial approach implemented in ScanIndel was able to recover the positions, types and sequences of indels with excellent sensitivity and false-discovery rate, by encompassing the full indel length spectrum present in the datasets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Biotechnology - Volume 250, 20 May 2017, Pages 11-15
نویسندگان
, ,