Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
9127016 | Gene | 2005 | 7 Pages |
Abstract
GC level is a key feature in prokaryotic genomes. Widely employed in evolutionary studies, new insights appear however limited because of the relatively low number of characterized genomes. Since public databases mainly comprise several hundreds of prokaryotes with a low number of sequences per genome, a reliable prediction method based on available sequences may be useful for studies that need a trustworthy estimation of whole genomic GC. As the analysis of completely sequenced genomes shows a great variability in distributional shapes, it is of interest to compare different estimators. Our analysis shows that the mean of GC values of a random sample of genes is a reasonable estimator, based on simplicity of the calculation and overall performance. However, usually sequences come from a process that cannot be considered as random sampling. When we analyzed two introduced sources of bias (gene length and protein functional categories) we were able to detect an additional bias in the estimation for some cases, although the precision was not affected. We conclude that the mean genic GC level of a sample of 10 genes is a reliable estimator of genomic GC content, showing comparable accuracy with many widely employed experimental methods.
Keywords
Related Topics
Life Sciences
Biochemistry, Genetics and Molecular Biology
Genetics
Authors
Alejandro Zavala, Hugo Naya, Héctor Romero, VÃctor Sabbia, Rosina Piovani, Héctor Musto,