Article ID Journal Published Year Pages File Type
2819000 Gene 2009 15 Pages PDF
Abstract

We present a computationally effective model to parameterize DNA sequences in a way describing comprehensively its auto and cross-correlation structure. The approach is based on four-channel Multivariate Autoregressive Model (MVAR). The model was applied to a study of genes from the globin family for 6 vertebrate species. First, the sequences were coded as four signals (corresponding to the nucleotides), which were fitted to a four-channel MVAR. From the correlation matrices the vectors of model coefficients were calculated as functions of the nucleotide distance. The between-chromosomes and inter-species differences were best distinguished in the cross-coefficients binding different nucleotide sequences. For clustering purposes different metrics were tested and then two clustering procedures (Nearest Neighbor and UPGMA) were applied. The clustering trees and consensus trees were constructed for exons, introns and whole genes. The results were in agreement with the known dependencies between the chromosomes of the globin family. The orthological genes for different species were grouped together. Inside these groups the phylogenetically close organisms were localized in proximity.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics
Authors
, , , ,