کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
388502 660926 2011 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
PolyA-iEP: A data mining method for the effective prediction of polyadenylation sites
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
PolyA-iEP: A data mining method for the effective prediction of polyadenylation sites
چکیده انگلیسی

This paper presents a study on polyadenylation site prediction, which is a very important problem in bioinformatics and medicine, promising to give a lot of answers especially in cancer research. We describe a method, called PolyA-iEP, that we developed for predicting polyadenylation sites and we present a systematic study of the problem of recognizing mRNA 3′ ends which contain a polyadenylation site using the proposed method. PolyA-iEP is a modular system consisting of two main components that both contribute substantially to the descriptive and predictive potential of the system. In specific, PolyA-iEP exploits the advantages of emerging patterns, namely high understandability and discriminating power and the strength of a distance-based scoring method that we propose. The extracted emerging patterns may span across many elements around the polyadenylation site and can provide novel and interesting biological insights. The outputs of these two components are finally combined by a classifier in a highly effective framework, which in our setup reaches 93.7% of sensitivity and 88.2% of specificity. PolyA-iEP can be parameterized and used for both descriptive and predictive analysis. We have experimented with Arabidopsis thaliana sequences for evaluating our method and we have drawn important conclusions.


► This paper deals with polyadenylation site prediction, an important problem in bioinformatics and medicine.
► We developed, PolyA-iEP, a method for predicting polyadenylation sites effectively, by systematically recognizing mRNA 3′ ends which contain a polyadenylation site.
► Our method (PolyA-iEP) exploits the advantages of emerging patterns, namely high understandability and discriminating power and the strength of a distance-based scoring method that we propose.
► The extracted emerging patterns may span across many elements around the polyadenylation site, a novel idea introduced in this paper that can provide interesting biological insights.
► Experiments with Arabidopsis thaliana sequences have shown that PolyA-iEP achieves a high prediction accuracy, which in our setup reaches 93.7% of sensitivity and 88.2% of specificity.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 38, Issue 10, 15 September 2011, Pages 12398–12408
نویسندگان
, , ,