Article ID Journal Published Year Pages File Type
531150 Pattern Recognition 2006 11 Pages PDF
Abstract
Prediction of the cellular location of a protein plays an important role in inferring the function of the protein. Feature extraction is a critical part in prediction systems, requiring raw sequence data to be transformed into appropriate numerical feature vectors while minimizing information loss. In this paper, we present a method for extracting useful features from protein sequence data. The method employs local and global pairwise sequence alignment scores as well as composition-based features. Five different features are used for training support vector machines (SVMs) separately and a weighted majority voting makes a final decision. The overall prediction accuracy evaluated by the 5-fold cross-validation reached 88.53% for the eukaryotic animal data set. Comparing the prediction accuracy of various feature extraction methods, provides a biological insight into the location of targeting information. Our experimental results confirm that our feature extraction methods are very useful for predicting subcellular localization of proteins.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,