کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
469417 698316 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Predicting protein-binding RNA nucleotides with consideration of binding partners
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Predicting protein-binding RNA nucleotides with consideration of binding partners
چکیده انگلیسی


• A method to predict protein-binding sites in RNA from sequence data alone is proposed.
• Protein-binding sites in RNA are predicted with consideration of protein in addition to RNA.
• This is the first sequence-based prediction of protein-binding sites in RNA at the nucleotide-level.
• The prediction method achieved an accuracy of 79.2% and MCC of 0.48 in independent testing.

In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Methods and Programs in Biomedicine - Volume 120, Issue 1, June 2015, Pages 3–15
نویسندگان
, , , ,