Article ID Journal Published Year Pages File Type
2075931 Biosystems 2014 6 Pages PDF
Abstract

Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S = G or C) dinucleotides are favoured in Ori whereas WW (W = A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.

Related Topics
Physical Sciences and Engineering Mathematics Modelling and Simulation
Authors
, , , , ,