Article ID Journal Published Year Pages File Type
7556824 Analytical Biochemistry 2018 24 Pages PDF
Abstract
RNA 5-methylcytosine (m5C) is an important post-transcriptional modification that plays an indispensable role in biological processes. The accurate identification of m5C sites from primary RNA sequences is especially useful for deeply understanding the mechanisms and functions of m5C. Due to the difficulty and expensive costs of identifying m5C sites with wet-lab techniques, developing fast and accurate machine-learning-based prediction methods is urgently needed. In this study, we proposed a new m5C site predictor, called M5C-HPCR, by introducing a novel heuristic nucleotide physicochemical property reduction (HPCR) algorithm and classifier ensemble. HPCR extracts multiple reducts of physical-chemical properties for encoding discriminative features, while the classifier ensemble is applied to integrate multiple base predictors, each of which is trained based on a separate reduct of the physical-chemical properties obtained from HPCR. Rigorous jackknife tests on two benchmark datasets demonstrate that M5C-HPCR outperforms state-of-the-art m5C site predictors, with the highest values of MCC (0.859) and AUC (0.962). We also implemented the webserver of M5C-HPCR, which is freely available at http://cslab.just.edu.cn:8080/M5C-HPCR/.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , , , , ,