کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
10351664 | 864509 | 2013 | 11 صفحه PDF | دانلود رایگان |
عنوان انگلیسی مقاله ISI
Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets
دانلود مقاله + سفارش ترجمه
دانلود مقاله ISI انگلیسی
رایگان برای ایرانیان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه
مهندسی کامپیوتر
نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله

چکیده انگلیسی
Several learning approaches have been used to predict RNA-binding amino acids in a protein sequence, but there has been little attempt to predict protein-binding nucleotides in an RNA sequence. One of the reasons is that the differences between nucleotides in their interaction propensity are much smaller than those between amino acids. Another reason is that RNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding RNA nucleotides is much harder than predicting RNA-binding amino acids. We developed a new method that removes data redundancy in a training set of sequences based on their features. The new method constructs a larger and more informative training set than the standard redundancy removal method based on sequence similarity, and the constructed dataset is guaranteed to be redundancy-free. We computed the interaction propensity (IP) of nucleotide triplets by applying a new definition of IP to an extensive dataset of protein-RNA complexes, and developed a support vector machine (SVM) model to predict protein binding sites in RNA sequences. In a 5-fold cross-validation with 812 RNA sequences, the SVM model predicted protein-binding nucleotides with an accuracy of 86.4%, an F-measure of 84.8%, and a Matthews correlation coefficient of 0.66. With an independent dataset of 56 RNA sequences that were not used in training, the resulting accuracy was 68.1% with an F-measure of 71.7% and a Matthews correlation coefficient of 0.35. To the best of our knowledge, this is the first attempt to predict protein-binding RNA nucleotides in a given RNA sequence from the sequence data alone. The SVM model and datasets are freely available for academics at http://bclab.inha.ac.kr/primer.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers in Biology and Medicine - Volume 43, Issue 11, November 2013, Pages 1687-1697
Journal: Computers in Biology and Medicine - Volume 43, Issue 11, November 2013, Pages 1687-1697
نویسندگان
Sungwook Choi, Kyungsook Han,