Article ID Journal Published Year Pages File Type
559301 Digital Signal Processing 2015 9 Pages PDF
Abstract

In this paper, we propose a new method for the prediction of protein coding regions that is designed to detect novel genes that do not have known, close homologs. The proposed method uses a dynamic representation scheme to convert DNA sequences into a numerical form, and then it uses the nucleotide distribution variance to calculate the period-3 spectrum. The dynamic representation scheme assigns numerical pairs to the nucleotides to emphasize the effect of the nucleotides that have a stronger participation in the period-3 spectrum. The proposed method also uses the nucleotide distribution variance which has less computational cost than the Fourier transform to extract the period-3 spectrum. A post-processing of the period-3 spectrum signal is performed to smooth the signal, detect the period-3 spectrum peaks, and locate the boundaries of the protein-coding regions.The analysis of the receiver operating characteristic (ROC) curves shows that the proposed method outperforms other Digital Signal Processing (DSP)-based methods. The analysis of the false positive peaks shows that these regions have a similarity with regions that have functional patterns in other DNA sequences. The method also highlights and explores the capabilities of techniques that perform better than homology-based techniques for de novo protein prediction. We believe that this is an area of research that has been underemphasized and deserves additional attention.

Related Topics
Physical Sciences and Engineering Computer Science Signal Processing
Authors
, ,