Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
2815462 | Gene | 2016 | 7 Pages |
•Relationships between repeats and genome features were analyzed.•Incidence, complexity and location of compound microsatellites were investigated.•Mono-, di- and trinucleotide repeats have a preference for repeats with base A or T.•Reference sequences contain fewer mononucleotide and dinucleotide repeats.
Microsatellites or simple sequence repeats (SSRs) are known to present ubiquitously in genomes of eukaryotes and prokaryotes, as well as viruses. A comprehensive analysis of microsatellites and compound microsatellites (CM) was performed for 67 T4-like bacteriophage genomes. We found that the number of repeats was generally proportional to the size of the genome. CM were more abundant in genic regions, while their relative abundance was higher in intergenic regions. Meanwhile, the number of CM rapidly decreased with the increase of complexity but gradually increased with higher dMAX (maximum distance between any two adjacent microsatellites). (A)n/(T)n, (AT)n/(TA)n and (AAG)n were the most abundant repeats of mono-, di- and trinucleotide microsatellites, respectively. The number of microsatellites in reference sequences was significantly lower than that in corresponding random sequences. This result was mainly attributed to mono- and dinucleotide repeats which hardly exceeded 6 bp in T4-like viruses. These observations may be helpful to understand the distribution of microsatellites and viral genetic diversity in T4-like viruses.