Article ID Journal Published Year Pages File Type
2079012 Chinese Journal of Biotechnology 2008 7 Pages PDF
Abstract
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. This study describes a discriminative method based on combined feature vectors for protein sequence coding and machine learning techniques for classification. The new combined feature vector consists of three types of features: amino acid composition, dipeptide composition, and weighted AAindex correlation coefficient. Further, a classification system on the basis of the combined feature coding method and support vector machine algorithms is developed, which is named CF_SVM. In cross validation tests and independent dataset tests on a dataset of 1087 proteins belonging to all different types of globular and membrane proteins, CF_SVM outperforms other methods in the literature for discriminating OMPs and other proteins. The influence of different ranks and weights of AAindex correlation coefficients in the combined features on the accuracy for discrimination is also discussed. The results of OMPs mining in 14 bacterial genomes show that CF_SVM can predict OMP candidates with high specificities, and performs better than other OMPs mining tool TMBETA-GENOME.
Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Biotechnology
Authors
, , ,