Article ID Journal Published Year Pages File Type
10351538 Computers in Biology and Medicine 2012 7 Pages PDF
Abstract
According to the repetition structure patterns of single-nucleotides, we propose a novel digital representation method to characterize primary DNA sequences. Based on this representation we give a new RP-SP (repeat and space) vector to compute the distance of different sequences. The examination of similarities/dissimilarities among different sequences illustrates the utility of the proposed RP-SP vector distance. Then, we use the proposed RP-SP vector method to analyze two groups of genomes, 15 E. coli genomes and 31 mitochondrial genomes. For comparison, we also apply other alignment-free methods to the two groups of genomes. The results show that the proposed method can distinguish characteristics of different genomes and used to reconstruct the phylogenetic tree of different genomes.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , ,