↓ Skip to main content

PLOS

Protein 3D Structure Computed from Evolutionary Sequence Variation

Overview of attention for article published in PLOS ONE, December 2011
Altmetric Badge

Readers on

mendeley
1293 Mendeley
citeulike
27 CiteULike
Title
Protein 3D Structure Computed from Evolutionary Sequence Variation
Published in
PLOS ONE, December 2011
DOI 10.1371/journal.pone.0028766
Pubmed ID
Authors

Debora S. Marks, Lucy J. Colwell, Robert Sheridan, Thomas A. Hopf, Andrea Pagnani, Riccardo Zecchina, Chris Sander

Abstract

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

X Demographics

X Demographics

The data shown below were collected from the profiles of 49 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 1,293 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 34 3%
United Kingdom 18 1%
Germany 13 1%
Canada 7 <1%
Spain 5 <1%
Argentina 3 <1%
China 3 <1%
India 2 <1%
Italy 2 <1%
Other 24 2%
Unknown 1182 91%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 370 29%
Researcher 268 21%
Student > Bachelor 142 11%
Student > Master 135 10%
Student > Doctoral Student 48 4%
Other 189 15%
Unknown 141 11%
Readers by discipline Count As %
Agricultural and Biological Sciences 463 36%
Biochemistry, Genetics and Molecular Biology 287 22%
Computer Science 117 9%
Chemistry 85 7%
Physics and Astronomy 62 5%
Other 115 9%
Unknown 164 13%