کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
14905 1360 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی شیمی بیو مهندسی (مهندسی زیستی)
پیش نمایش صفحه اول مقاله
Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
چکیده انگلیسی


• Identification of disease genes in semi-supervised learning methods, called positive-unlabeled learning.
• In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks.
• A Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree.
• The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12,950 disease genes with 949 positive genes from six class of diseases and 12,001 unlabeled genes.
• Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance.

Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Biology and Chemistry - Volume 64, October 2016, Pages 263–270
نویسندگان
, ,