Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4948278 | Neurocomputing | 2016 | 27 Pages |
Abstract
To better understand the functions of proteins, it is a critical step to predict their subcellular locations. Recently, numerous computational methods have been developed for protein subcellular localization prediction. Most of existing methods rely on the Gene Ontology (GO) information for feature representation. Although the GO information is proved to be beneficial for the improved predictive performance of the methods in prior research, the following problem is that it generates a super-high dimensional feature space, and the dimension of the feature space will get higher and higher as the number of the terms in the GO database increase. To address this issue, we propose a novel feature representation method sufficiently exploring the sequence evolutional information rather than using the GO information. Using the proposed feature representation method, we generate a comprehensive feature set of 828 features from the following three aspects: physicochemical properties, position-specific score matrix (PSSM), and the k-skip-n-gram model. By featuring a multi-label ensemble classifier with the proposed features, we further develop a novel multi-label learning method, namely mGOF-loc. Results on an updated large-scale dataset distributed with 37 subcellular locations show that mGOF-loc outperforms existing methods. Currently, a webserver that implements mGOF-loc is freely available on http://server.malab.cn/mGOF-loc/Index.html.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Leyi Wei, Minghong Liao, Xing Gao, Jingjing Wang, Weiqi Lin,