کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
5513578 1541216 2016 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Prediction of Protein-Protein Interaction via co-occurring Aligned Pattern Clusters
ترجمه فارسی عنوان
پیش بینی تعامل پروتئین-پروتئین از طریق همبستگی خوشه های الگوی هماهنگ
کلمات کلیدی
پروتئین-پروتئین تعامل، خوشه الگوی تقسیم همگانی، نظارت بر یادگیری، جنگل تصادفی
موضوعات مرتبط
علوم زیستی و بیوفناوری بیوشیمی، ژنتیک و زیست شناسی مولکولی زیست شیمی
چکیده انگلیسی


- APCs were introduced to model sequence patterns with variable length and variants.
- cAPC pairs were developed to model the co-occurring sequence patterns in PPI.
- A method was proposed to turn a protein pair into a feature vector using cAPC pairs.
- WeMine-PPI, a new PPI prediction method with outperforming results, was proposed.
- WeMine-PPI allows biologically intuitive understanding of the feature vector.

Predicting Protein-Protein Interaction (PPI) is important for making new discoveries in the molecular mechanisms inside a cell. Traditionally, new PPIs are identified through biochemical experiments but such methods are labor-intensive, expensive, time-consuming and technically ineffective due to high false positive rates. Sequence-based prediction is currently the most readily applicable and cost-effective method. It exploits known PPI Databases to construct classifiers for predicting unknown PPIs based only on sequence data without requiring any other prior knowledge. Among existing sequence-based methods, most feature-based methods use exact sequence patterns with fixed length as features - a constraint which is biologically unrealistic. SVM with Pairwise String Kernel renders better predicting performance. However it is difficult to be biologically interpretable since it is kernel-based where no concrete feature values are computed. Here we have developed a novel method WeMine-P2P to overcome these drawbacks. By assuming that the regions/sites that mediate PPI are more conserved, WeMine-P2P first discovers/locates the conserved sequence patterns in protein sequences in the form of Aligned Pattern Clusters (APCs), allowing pattern variations with variable length. It then pairs up all APCs into a set of Co-Occurring APC (cAPC) pairs, and computes a cAPC-PPI score for each cAPC pair on all PPI pairs. It further constructs a feature vector composed of all cAPC pairs with their cAPC-PPI scores for each PPI pair and uses them for constructing a PPI predictor. Through 40 independent experiments, we showed that (1) WeMine-P2P outperforms the well-known algorithm, PIPE2, which also utilizes co-occurring amino acid sequence segments but does not allow variable lengths and pattern variations; (2) WeMine-P2P achieves satisfactory PPI prediction performance, comparable to the SVM-based methods particularly among unseen protein sequences with a potential reduction of feature dimension of 1280×; (3) Unlike SVM-based methods, WeMine-P2P renders interpretable biological features from which we observed that co-occurring sequence patterns from the compositional bias regions are more discriminative. WeMine-P2P is extendable to predict other biosequence interactions such as Protein-DNA interactions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Methods - Volume 110, 1 November 2016, Pages 26-34
نویسندگان
, , , ,