کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4948279 1439610 2016 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning from real imbalanced data of 14-3-3 proteins binding specificity
ترجمه فارسی عنوان
یادگیری از داده های نامتقارن واقعی از ویژگی های اتصال 14-3-3 پروتئین
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor gene. For each 14-3-3 isoform, we have 1000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Then, a similarity-based undersampling approach and a SMOTE-like oversampling approach are used to deal with imbalanced distribution of the known peptide motifs. Finally, we consider locally weighted regression to predict affinity values of peptide motifs, which combines the simplicity of linear least squares regression with the flexibility of nonlinear regression. Our method tests on the 1000 peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall Pearson-product-moment correlation coefficient (PCC) and the root mean squared error (RMSE) values of 0.83 and 258.31 for N-terminal sublibrary, and 0.80 and 250.89 for C-terminal sublibrary, respectively. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs have the same amino acid as experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is a fast and reliable computational method that can be used in peptide-protein binding identification in proteomics research.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 217, 12 December 2016, Pages 83-91
نویسندگان
, , ,