Article ID Journal Published Year Pages File Type
530834 Pattern Recognition 2012 9 Pages PDF
Abstract

Spectrum analysis approaches, such as the Fourier transform, wavelet transform and autoregressive model, have been successfully applied to solve the exon prediction problem due to their flexibility that requires no training data or prior knowledge. Detecting short exons is a difficult problem. The results achieved by the traditional methods are often unsatisfactory, because they cannot identify spectral patterns of short exons correctly. In this article, we propose an improved exon prediction method based on empirical mode decomposition and the Fourier transform. The proposed approach numerically represents the DNA sequences by their structural features, which can help to yield significant patterns that are rarely observed with the traditional methods. The structural profile is utilized to detect probable exons by examining the peaks of the local 1/3 frequency spectrum within a sliding window. The data in the window is firstly decomposed by empirical mode decomposition into a collection of intrinsic mode functions. Then the first intrinsic mode function is used to compute the local spectrum by fast Fourier transform. We compare our method with the traditional Fourier transform with binary representation method and the recently proposed paired spectral content method. Experiments on randomly selected Human genome dataset and the GENSCAN benchmark dataset illustrate that our method can enhance the signal-to-noise ratio of the analyzed sequences and improve the prediction accuracy of short exons.

► Some structural profiles of exons indeed exhibit clear three-base periodicity. ► Using structural profiles can yield significant patterns than traditional methods. ► EMD plus FFT scheme can help to improve the prediction accuracy of short exons.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,