کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
490365 707359 2013 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Next-generation Sequence Clustering Method for E. Coli through Proteomics-genomics Data Mapping
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
A Next-generation Sequence Clustering Method for E. Coli through Proteomics-genomics Data Mapping
چکیده انگلیسی

Recent publications of various ‘omics’ data have provided new challenges and opportunities to the development of novel approaches to the assembly of next-generation sequences. As an attempt to improve the quality of assembled sequences, we developed a next-generation sequence clustering method by using the interdependency between genomics and proteomics data, which has not been well utilized so far in this field. Given a set of next-generation read sequences with a number of protein sequences, our method clusters the read sequences by mapping to the protein sequences. As a preliminary research, we selected Escherichia coli (E. coli) as our target species and simulated next-generation reads of E. coli to evaluate our method by analyzing the actual adjacency of the clustered reads in the E. coli genome. We found that (i) read base matching (RBM) ratio, which represents the amount of bases in a read that are mapped to a protein sequence, higher than 50∼70% is a useful criterion for effective read clustering and (ii) higher RBM ratio does not always lead to better quality of clusters in the case of E. coli. These preliminary results demonstrate that the integrative approach is simple yet has great potential for clustering adjacent reads in a genome.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 23, 2013, Pages 96-101