Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

Article ID	Journal	Published Year	Pages	File Type
518301	Journal of Computational Physics	2014	16 Pages	PDF

Abstract

Direct-coupling analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced l2l2-regularized pseudolikelihood maximization method called plmDCA can be modified as to be easily parallelizable, as well as inherently faster on a single processor, at negligible difference in accuracy. We test the new incarnation of the method on 143 protein family/structure-pairs from the Protein Families database (PFAM), one of the larger tests of this class of algorithms to date.

Keywords

Contact map Pseudolikelihood Potts model Protein structure prediction