Article ID Journal Published Year Pages File Type
1979158 Current Opinion in Structural Biology 2011 6 Pages PDF
Abstract

Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.

► Largest one-time increases of protein databases came from metagenomics projects. ► Unbiased sampling provides a glimpse of real life gene content of the environment. ► Thousands of new protein families are discovered in metagenomics projects. ► Metagenomics datasets are dominated by divergence of known protein families. ► Pilot studies of new families discover connections to previously studied families.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Biochemistry
Authors
,