Design a sequence to fold into a protein structure

ProteinMPNN significantly outperforms traditional approaches like Rosetta, achieving 52.4% sequence recovery. It can design sequences for single or multiple chains and has been experimentally validated through X-ray crystallography, cryoEM, and functional studies. The method has successfully designed various protein types including monomers, cyclic homo-oligomers, and target binding proteins, representing a major advancement in computational protein design.

ProteinMPNN is often used after RFdiffusion to generate sequences for a given designed structure, since RFdiffusion/RFantibody will design structures with poly-Gs as placeholders for designed residues. It can also be used directly from a starting structure to generate stabilizing mutations.

Inputs

  • PDB File
  • Designed Residues - select residues on each chain to be designed
  • Temperature: adjust the amount of diversity in your sequences. Higher value will generate more mutations.

Outputs

  • Overall Confidence: Average over all redesigned residues (exp[-mean_over_residues(log_probs)]) - higher means more confident

Alternative weights

Others have finetuned ProteinMPNN for different use cases. You can use the following weights by changing the “Model Type” parameter:

  • SolubleMPNN - trained on soluble proteins
  • AbMPNN - trained on antibodies
  • HyperMPNN - trained on hyperthermophilic proteins
  • LigandMPNN - takes ligand atoms into account

You can also check out ThermoMPNN (uses ProteinMPNN embeddings to identify thermostable point mutations).

Try proteinmpnn