Article ID Journal Published Year Pages File Type
2184477 Journal of Molecular Biology 2013 13 Pages PDF
Abstract

The widespread application of whole-genome sequencing is identifying numerous non-synonymous single nucleotide polymorphisms (nsSNPs), many of which are associated with disease. We analyzed nsSNPs from Humsavar and the 1000 Genomes Project to investigate why some proteins and domains are more tolerant of mutations than others. We identified 311 proteins and 112 Pfam families, corresponding to 2910 domains, as diseasesusceptible and 32 proteins and 67 Pfam families (10,783 domains) as diseaseresistant based on the relative numbers of disease-associated and neutral polymorphisms. Proteins with no significant difference from expected numbers of disease and polymorphism nsSNPs are classified as other. This classification takes into account the phenotypes of all known mutations in the protein or domain rather than simply classifying based on the presence or absence of disease nsSNPs. Of the two hypotheses suggested, our results support the model that disease-resistant domains and proteins are more able to tolerate mutations rather than having more lethal mutations that are not observed. Disease-resistant proteins and domains show significantly higher mutation rates and lower sequence conservation than disease-susceptible proteins and domains. Disease-susceptible proteins are more likely to be encoded by essential genes, are more central in protein–protein interaction networks and are less likely to contain loss-of-function mutations in healthy individuals. We use this classification for nsSNP phenotype prediction, predicting nsSNPs in disease-susceptible domains to be disease and those in disease-resistant domains to be polymorphism. In this way, we achieve higher accuracy than SIFT, a state-of-the-art algorithm.

Graphical AbstractFigure optionsDownload full-size imageDownload high-quality image (175 K)Download as PowerPoint slideHighlights► Many disease-associated nsSNPs and the proteins containing them have been studied. ► We use the numbers of disease and neutral nsSNPs to classify proteins and domains. ► Tolerance of proteins and domains for mutations is related to conservation. ► It is also related to interaction networks and function. ► This classification is useful for predicting phenotypes of nsSNPs.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Cell Biology
Authors
, ,