دانلود رایگان مقاله: پیش بینی قوی از پروتئین بیولومینسنت به کمک آموزش بدون نظارت

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
504827	864440	2016	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Unsupervised learning assisted robust prediction of bioluminescent proteins

ترجمه فارسی عنوان

پیش بینی قوی از پروتئین بیولومینسنت به کمک آموزش بدون نظارت

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

عدم تعادل کلاس؛ تمرین مجموعه تنوع؛ توزیع بهینه کلاس ؛ K-Means؛ SMOTE

SMOTE Class imbalance - عدم تعادل کلاس k-Means - میانگین ـ کی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش مقاله

پیش بینی قوی از پروتئین بیولومینسنت به کمک آموزش بدون نظارت

چکیده انگلیسی

• Combination of unsupervised learning with SMOTE for imbalance learning problems.
• Effective handling of between class and within class imbalance.
• Diversification of the training set with optimal class distribution.
• Does not require evolutionary information for prediction.

Bioluminescence plays an important role in nature, for example, it is used for intracellular chemical signalling in bacteria. It is also used as a useful reagent for various analytical research methods ranging from cellular imaging to gene expression analysis. However, identification and annotation of bioluminescent proteins is a difficult task as they share poor sequence similarities among them. In this paper, we present a novel approach for within-class and between-class balancing as well as diversifying of a training dataset by effectively combining unsupervised K-Means algorithm with Synthetic Minority Oversampling Technique (SMOTE) in order to achieve the true performance of the prediction model. Further, we experimented by varying different levels of balancing ratio of positive data to negative data in the training dataset in order to probe for an optimal class distribution which produces the best prediction accuracy. The appropriately balanced and diversified training set resulted in near complete learning with greater generalization on the blind test datasets. The obtained results strongly justify the fact that optimal class distribution with a high degree of diversity is an essential factor to achieve near perfect learning. Using random forest as the weak learners in boosting and training it on the optimally balanced and diversified training dataset, we achieved an overall accuracy of 95.3% on a tenfold cross validation test, and an accuracy of 91.7%, sensitivity of 89. 3% and specificity of 91.8% on a holdout test set. It is quite possible that the general framework discussed in the current work can be successfully applied to other biological datasets to deal with imbalance and incomplete learning problems effectively.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers in Biology and Medicine - Volume 68, 1 January 2016, Pages 27–36

نویسندگان

Abhigyan Nath, Karthikeyan Subbiah,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : پیش بینی قوی از پروتئین بیولومینسنت به کمک آموزش بدون نظارت

دسترسی سریع

ارتباط

English Website