دانلود رایگان مقاله: جنگل محدود برای شبکه برای طبقه بندی منظم داده های اومیکا

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
1993250	1541243	2015	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Network-constrained forest for regularized classification of omics data

ترجمه فارسی عنوان

جنگل محدود برای شبکه برای طبقه بندی منظم داده های اومیکا

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

99-00 Random forest - جنگل‌های تصادفی یا جنگل‌های تصمیم تصادفی domain knowledge - دانش دامنه MicroRNA - میکرو RNA Machine learning - یادگیری ماشین

موضوعات مرتبط

علوم زیستی و بیوفناوری بیوشیمی، ژنتیک و زیست شناسی مولکولی زیست شیمی

پیش نمایش مقاله

جنگل محدود برای شبکه برای طبقه بندی منظم داده های اومیکا

چکیده انگلیسی

• Forest-based method for learning from high-dimensional data with feature network.
• Does not rely strictly on domain knowledge, uses regularization instead.
• Diversifies base trees by sampling their features from network-defined distribution.
• Improves accuracy and interpretability of standard Random Forest.

Contemporary molecular biology deals with wide and heterogeneous sets of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that reduces this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA–mRNA target relations and mRNA–mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network. The proposed method is publicly available in terms of miXGENE system (http://mixgene.felk.cvut.cz), the workflow that implements the myelodysplastic syndrome experiments is presented as a dedicated case study.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Methods - Volume 83, 15 July 2015, Pages 88–97

نویسندگان

Michael Anděl, Jiří Kléma, Zdeněk Krejčík,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : جنگل محدود برای شبکه برای طبقه بندی منظم داده های اومیکا

دسترسی سریع

ارتباط

English Website