Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
8406333 | Biosystems | 2018 | 8 Pages |
Abstract
In this work, we tried to develop an alignment-free scoring method to study sequences from different proteins belonging to humans to identify the disease-associations of the sequences. A total of 52 protein sequences were analyzed. There were 599 reported polymorphic sites and 802 (708 polymorphic and 94 disease-associated) Single Amino acid Variants (SAVs) in the training data set. For cross-validation purposes, another set of 62 protein sequences (26 enzymes, 16 Membrane-bound Enzymes and 20 Membrane-bound Proteins), with a total of 261 reported polymorphic sites and 799 (291 polymorphic and 508 disease-associated) SAVs, were used. A negative correlation was observed for both training and cross-validation data set between percentage of reported disease-associated SAVs with a ratio of (polymorphic site : protein length). A new scoring pattern was also developed that would take into account the ratio of polymorphic site and protein length by counting the number of polymorphic amino acids and the total numbers of amino acids in proteins.
Related Topics
Physical Sciences and Engineering
Mathematics
Modelling and Simulation
Authors
Ananya Ali, Angshuman Bagchi,