Article ID Journal Published Year Pages File Type
2818317 Gene 2011 5 Pages PDF
Abstract

Linguistic (word count) analysis of prokaryotic genome sequences, by Shannon N-gram extension, reveals that the dominant hidden motifs in A + T rich genomes are T(A)(T)A and G(A)(T)C with uncertain number of repeating A and T. Since prokaryotic sequences are largely protein-coding, the motifs would correspond to amphipathic alpha-helices with alternating lysine and phenylalanine as preferential polar and non-polar residues. The motifs are also known in eukaryotes, as nucleosome positioning patterns. Their existence in prokaryotes as well may serve for binding of histone-like proteins to DNA. In this case the above patterns in prokaryotes may be considered as “anticipated” nucleosome positioning patterns which, quite likely, existed in prokaryotic genomes before the evolutionary separation between eukaryotes and prokaryotes.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics
Authors
, ,