Article ID Journal Published Year Pages File Type
5590117 Genomics 2016 9 Pages PDF
Abstract

•Complex number encoding of DNA sequences by Chaos Game Representation is proposed.•Fourier power spectra of DNA sequences is computed from the complex number encoding.•Alignment-free analysis of DNA sequences using Fourier power spectra is proposed.

Numerical encoding plays an important role in DNA sequence analysis via computational methods, in which numerical values are associated with corresponding symbolic characters. After numerical representation, digital signal processing methods can be exploited to analyze DNA sequences. To reflect the biological properties of the original sequence, it is vital that the representation is one-to-one. Chaos Game Representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane that allows the depiction of the DNA sequence in the form of image. Using CGR, a biological sequence can be transformed one-to-one to a numerical sequence that preserves the main features of the original sequence. In this research, we propose to encode DNA sequences by considering 2D CGR coordinates as complex numbers, and apply digital signal processing methods to analyze their evolutionary relationship. Computational experiments indicate that this approach gives comparable results to the state-of-the-art multiple sequence alignment method, Clustal Omega, and is significantly faster. The MATLAB code for our method can be accessed from: www.mathworks.com/matlabcentral/fileexchange/57152

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics
Authors
, , ,