Article ID Journal Published Year Pages File Type
8876829 Journal of Theoretical Biology 2018 8 Pages PDF
Abstract
Gene recombination is a key process to produce hereditary differences. Recombination spot identification plays an important role in revealing genome evolution and promoting DNA function study. However, traditional experiments are not good at identifying recombination spot with huge amounts of DNA sequences springed up by sequencing. At present, some machine learning methods have been proposed to speed up this identification process. However, the correlations between nucleotides pairs at different positions along DNA sequence is often ignored, which reflects the important sequence order information. For this purpose, this study proposes a novel feature extraction method, called iRSpot-ADPM, based on DNA property in a given DNA sequence. 85 features are selected from the original feature set according to the weights calculated by support vector machine. Five-fold cross validation tests on two widely used benchmark datasets indicate that the proposed method outperforms its existing counterparts on the individual specificity(Spec), Matthews correlation coefficient(MCC) value and overall accuracy(OA). The experimental results show that the proposed method is effective for accurate recombination spot identification. Moreover, it is anticipated that the proposed method could be extended to other biology sequence and be helpful in future research. The datasets and Matlab source codes can be download from the URL: http://stxy.neuq.edu.cn/info/1095/1157.htm.
Related Topics
Life Sciences Agricultural and Biological Sciences Agricultural and Biological Sciences (General)
Authors
, ,