Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
15199 | Computational Biology and Chemistry | 2012 | 9 Pages |
Mutual information (MI) is an approach commonly used to estimate the evolutionary correlation of 2 amino acid sites. Although several MI methods exist, prior to our contribution no systematic method had been developed to assess their performance, or to establish numerical thresholds to detect co-evolving amino acid sites. The current study performed a Markov chain Monte Carlo (MCMC) algorithm on influenza viral sequences to capture their evolutionary characteristics. A consensus maximum clade credibility (MCC) tree was estimated from the samples, together with their amino acid substitution statistics, from which we generated synthetic sequences of known dependent and independent paired amino acid sites. A pair-to-pair and influenza-specific amino acid substitution matrix (P2PFLU) incorporated into Bayesian Evolutionary Analysis Sampling Trees (BEAST) enumerated these synthetic sequences. The sequences inherited evolutionary features and co-varying characteristics from the real viral sequences, rendering these synthetic data ideal for exploring their co-evolving features. For the MI measure, we proposed a novel metric called the empirical MI (MIEm), which outperformed other MI measures in analysis of receiver operating characteristics (ROC). We implemented our approach on 1086 all-time PB2 sequences of influenza A H5N1 viruses, in which we found 97 sites exhibiting co-evolutionary substitution of one or more amino acid sites. In particular, PB2 451, along with eight other PB2 sites of various MIEm scores, was found to co-evolve with PB2 627, a known species-associated amino acid residue which plays a critical role in influenza virus replication.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideHighlights► Synthetic sequences featuring co-evolutionary characteristics are generated. ► A novel MIEm measure is proposed to detect co-evolving amino acid pairs. ► MIEm scores are computed on synthetic sequences and ROC analysis performed. ► A cut-off value for detecting co-evolving amino acid sites is chosen. ► Co-evolving sites of real sequences meeting the prescribed cut-off are summarized.