Skip to main content
BoltzGen designs proteins, peptides, nanobodies, and other molecules that bind to target structures or small molecules. Achieves 60-70% success rates for nanobody and protein binders against novel targets with only 15 designs tested.

Experimental Validation

Validated across 8 independent wetlab campaigns spanning diverse targets and modalities.

Novel Targets (No Similar Bound Structures in PDB)

Nanobody & Protein Design: 66% success rate (6/9 targets with nM binders) testing ≤15 designs per target
TargetNanobody Best KdProtein Best Kd
PHYH7.8 nM22 nM
PMVK6.1 nM10 nM
RFK8.8 nM-
MZB1120 nM9.8 nM
AMBP-53 nM
IDI2520 nM26 nM
HNMT99 nM-
GM2A-270 nM

Benchmark Targets

80% success rate for both nanobodies and proteins against 5 clinical targets (IL-7Rα, InsulinR, PDGFR, PD-L1, TNFα)

Other Modalities

Bioactive Peptides: nM-μM binders for melittin, indolicidin, protegrin (6 designs per target) RagC Linear Peptides: 7/29 designs bound (best: 3.5 μM) RagA:RagC Cyclic Peptides: 14/24 designs bound (best: 80 μM) NPM1 Disordered Region: 1/5 designs showed nucleolar localization in live cells Small Molecules: Weak binding (30-250 μM) for rucaparib and rhodamine derivative GyrA Antimicrobials: 19.5% of 1,808 designs inhibited E. coli growth >4×

Design Tasks

Nanobody Design

Design single-domain antibodies against protein targets. Required:
  • Target PDB file
  • Length range (e.g., 110-130)
Optional: Target chains, binding site, number of designs (default: 10), batches (default: 1), budget (default: 2)

Protein Binder Design

Design general protein binders. Required:
  • Target PDB file
  • Length range (e.g., 100-150)
Optional: Target chains, binding site, number of designs, batches, budget

Peptide Design

Design linear or cyclic peptides. Required:
  • Target PDB file
  • Length range (e.g., 10-20)
Optional: Target chains, binding site, cyclic flag, number of designs, batches, budget

Small Molecule Binder Design

Design proteins that bind small molecules. Required:
  • Target SMILES string
  • Length range (e.g., 100-150)
Optional: Number of designs, batches, budget Note: Currently achieves weak (30-250 μM) binding; consider 10,000+ designs for better results

Custom Design

Advanced mode for complex designs with multiple entities and constraints. Required:
  • Entities: List of proteins and ligands with:
    • Type: “protein” or “ligand”
    • Chain ID
    • Sequence (use numbers for designed regions: “15..20”, letters for fixed: “AAVTT15”)
    • CCD Code or SMILES (for ligands)
Optional: Binding types, secondary structure, cyclic flag, constraints (disulfide bonds), number of designs, batches, budget Use Cases: Disulfide-bonded cyclic peptides, stapled peptides, multi-chain complexes

Best Practices

Starting Out:
  • Test run: 10-50 designs
  • Production: 10,000-60,000 designs for challenging targets
  • Budget: 2-100 final diverse designs
Target Selection:
  • Binding sites should have ≥3 hydrophobic residues
  • Avoid heavily glycosylated regions
  • Specify binding site when possible
Expected Success Rates:
  • Nanobodies/Proteins: 60-70% (nM affinity) for novel targets
  • Peptides: Lower affinity (μM range) but good hit rates
  • Small molecules: Challenging (μM affinity)
Troubleshooting:
  • No successes: Try different binding sites or longer runs
  • Low expression: Check for hydrophobic patches
  • Avoid lengths 73-76: Known memorization issue (generates ubiquitin)

Pipeline Output

When the pipeline completes, your output directory will contain:

Configuration Files

  • config/ — Configuration files
  • steps.yaml — Pipeline steps configuration

Initial Designs

intermediate_designs/ — Output of design step
  • *.cif — CIF structure files for designed proteins and targets before inverse folding
  • *.npz — Metadata files for designs

Processed Designs

intermediate_designs_inverse_folded/ — Output of inverse folding, folding, and analysis steps
  • *.cif — CIF files after inverse folding (designed residues have backbone atoms only; sidechain coordinates are 0,0,0)
  • *.npz — Metadata files
  • refold_cif/ — Refolded complex structures (target + binder). Primary input for analysis and filtering
  • refold_design_cif/ — Refolded binder structures without target
  • aggregate_metrics_analyze.csv — Aggregated metrics across all designs
  • per_target_metrics_analyze.csv — Metrics per target

Final Results

final_ranked_designs/ — Output of filtering step
  • intermediate_ranked_[N]_designs/ — Top-N quality designs (CIFs copied from refold_cif/)
  • final_[budget]_designs/ — Final quality + diversity set (CIFs copied from refold_cif/)
  • all_designs_metrics.csv — Metrics for all designs considered by filtering
  • final_designs_metrics_[budget].csv — Metrics for selected final set
  • results_overview.pdf — Visualization plots

Key Metrics in CSV Files

Quality Metrics:
  • design_ptm — Predicted TM-score for designed structure (higher = better, >0.75 recommended)
  • design_iptm — Predicted TM-score for design-target interactions (higher = better)
  • filter_rmsd — RMSD to refolded structure (lower = better, <2.5Å recommended)
Interface Metrics:
  • plip_hbonds_refolded — Number of hydrogen bonds between design and target
  • plip_saltbridge_refolded — Number of salt bridge interactions
  • delta_sasa_refolded — Change in solvent accessible surface area (higher = better burial)
Developability Metrics:
  • design_hydrophobicity — Hydrophobicity score of designed residues
  • design_largest_hydrophobic_patch_refolded — Area of largest hydrophobic patch (lower = better)
  • liability_score — Overall developability score (lower = better)
Ranking:
  • final_rank — Final ranking position (1 = best)
  • pass_filters — Binary flag indicating whether design passed all filters

Runtime

Approximate time per design for ~200 residues:
  • ~60 sec (generation)
  • ~5 sec (inverse folding)
  • ~60 sec (structure prediction)
  • ~20 sec total (filtering all designs)
Try BoltzGen