کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
2820668 1160879 2014 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Large scale comparison of non-human sequences in human sequencing data
ترجمه فارسی عنوان
مقایسه مقادیر غیر انسانی در داده های توالی انسانی
موضوعات مرتبط
علوم زیستی و بیوفناوری بیوشیمی، ژنتیک و زیست شناسی مولکولی ژنتیک
چکیده انگلیسی


• An analysis of non-human sequences in sequencing data from the 1000 Genomes Project
• Identification of commonality in non-human sequences in human genome sequencing data
• Comparison of different ethnicities, sequencing centers and sequence methods

Several studies have demonstrated that unmapped reads in next generation sequencing data could be used to identify infectious agents or structural variants, but there has been no intensive effort to analyze and classify all non-human sequences found in individual large data sets. To identify commonality in non-human sequences by infectious agents and putative contamination events, we analyzed non-human sequences in 150 genomic sequencing data files from the 1000 Genomes Project and observed that 0.13% of reads on average showed similarities to non-human genomes. We compared results among different sample groups divided based on ethnicities, sequencing centers and enrichment methods (whole genome sequencing vs. exome sequencing) and found that sequencing centers had specific signatures of contaminating genomes as ‘time stamps’. We also observed many unmapped reads that falsely indicated contamination because of the high similarity of human sequences to sequences in non-human genome assemblies such as mouse and Nicotiana.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Genomics - Volume 104, Issue 6, Part B, December 2014, Pages 453–458
نویسندگان
, , , , ,