کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
10825637 | 1064661 | 2014 | 7 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Estimating the composition of species in metagenomes by clustering of next-generation read sequences
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
موضوعات مرتبط
علوم زیستی و بیوفناوری
بیوشیمی، ژنتیک و زیست شناسی مولکولی
زیست شیمی
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
Faster and cheaper sequencing technologies together with the ability to sequence uncultured microbes collected from any environment present us an opportunity to distill meaningful information from the millions of new genomic sequences from environmental samples, called metagenome. Contrary to conventional cultured microbes, however, the metagenomic data is extremely heterogeneous and noisy. Therefore the separation of the sets of sequenced genomic fragments that belong to different microbes is essential for successful assembly of microbial genomes. In this paper, we present a novel clustering method for a given metagenomic dataset. The metagenomic dataset has some distinguished features because (i) it is possible that similar sequence patterns may exist in different species and (ii) each species has different number of individuals in the given metagenomic dataset. Our method overcomes these obstacles by using the Gaussian mixture model and analysis of mixture profiles, and taking advantage of genomic signatures extracted from the metagenomic dataset. Unlike conventional clustering methods where clusters are discovered through global similarities of data instances, our method builds clusters by combining the data instances sharing local similarities captured by mixture analysis. By considering shared mixture components, our method is able to create clusters of genomic sequences although they are globally distinct each other. We applied our method to an artificial metagenomic dataset comprised of simulated 47 million reads from 25 real microbial genomes, and analyzed the resulting clusters in terms of the number of clusters, the number of participating species and dominant species in each cluster. Even though our approach cannot address all challenges in the field of metagenome sequence clustering, we believe that out method can contribute to take a step forward to achieve the goals.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Methods - Volume 69, Issue 3, 1 October 2014, Pages 213-219
Journal: Methods - Volume 69, Issue 3, 1 October 2014, Pages 213-219
نویسندگان
Ho-Sik Seok, Woonyoung Hong, Jaebum Kim,