Skip to main content
BoltzGen designs proteins, peptides, nanobodies, and other molecules that bind to target structures or small molecules. Achieves 60-70% success rates for nanobody and protein binders against novel targets with only 15 designs tested.

Experimental Validation

Validated across 8 independent wetlab campaigns spanning diverse targets and modalities.

Novel Targets (No Similar Bound Structures in PDB)

Nanobody & Protein Design: 66% success rate (6/9 targets with nM binders) testing ≤15 designs per target
TargetNanobody Best KdProtein Best Kd
PHYH7.8 nM22 nM
PMVK6.1 nM10 nM
RFK8.8 nM-
MZB1120 nM9.8 nM
AMBP-53 nM
IDI2520 nM26 nM
HNMT99 nM-
GM2A-270 nM

Benchmark Targets

80% success rate for both nanobodies and proteins against 5 clinical targets (IL-7Rα, InsulinR, PDGFR, PD-L1, TNFα)

Other Modalities

Bioactive Peptides: nM-μM binders for melittin, indolicidin, protegrin (6 designs per target) RagC Linear Peptides: 7/29 designs bound (best: 3.5 μM) RagA:RagC Cyclic Peptides: 14/24 designs bound (best: 80 μM) NPM1 Disordered Region: 1/5 designs showed nucleolar localization in live cells Small Molecules: Weak binding (30-250 μM) for rucaparib and rhodamine derivative GyrA Antimicrobials: 19.5% of 1,808 designs inhibited E. coli growth >4×

Design Tasks

Nanobody Design

Design single-domain antibodies against protein targets. Required:
  • Target PDB file
  • Target chains (optional, defaults to all)
Optional:
  • Binding site residues
  • Scaffold type: Default (de novo) or Custom (optimize existing framework)
Custom Framework Options:
  • Framework input type: Structure or Sequence
  • For structure input: Framework PDB file, framework chain, CDR regions to design
  • For sequence input: Framework sequence with CDR placeholders using number ranges (e.g., EVQLVESGGGLVQPGGSLRLSCAASG5..10WVRQAPGKGLEWVS8..12RFTISRDNSKNTLYLQMNSLRAEDTAVYYC10..20WGQGTLVTVSS)
  • CDR exclude counts: Residues to remove from start of each CDR (e.g., “3,3,7”)
  • CDR insertion lengths: Ranges for new residues per CDR (e.g., “1-5,1-5,1-14”)

Antibody Design

Design antibodies with heavy and light chains against protein targets. Required:
  • Target PDB file
  • Target chains (optional, defaults to all)
Optional:
  • Binding site residues
  • Scaffold type: Default (de novo) or Custom (optimize existing framework)
Custom Framework Options:
  • Framework input type: Structure or Sequence
  • For structure input: Framework PDB file, heavy chain ID, light chain ID, heavy/light CDR regions
  • For sequence input: Heavy chain framework sequence and light chain framework sequence with CDR placeholders
  • Heavy/Light CDR exclude counts and insertion length ranges

Protein Binder Design

Design general protein binders. Required:
  • Target PDB file
  • Length range (e.g., 100-150)
Optional: Target chains, binding site, number of designs, batches, budget

Peptide Design

Design linear or cyclic peptides. Required:
  • Target PDB file
  • Length range (e.g., 10-20)
Optional: Target chains, binding site, cyclic flag, number of designs, batches, budget

Cyclotide Design

Design cyclic peptides with disulfide bonds. Required:
  • Target PDB file
  • Cyclotide sequence specification with cysteines for disulfide bonds (e.g., “3C8C6C5C3C1C2” means 3 residues, Cys, 8 residues, Cys, etc.)
  • Disulfide bond pairs (positions of cysteine pairs to form bonds)
Optional: Target chains, binding site, number of designs, batches, budget

Small Molecule Binder Design

Design proteins that bind small molecules. Required:
  • Target SMILES string
  • Length range (e.g., 100-150)
Optional: Number of designs, batches, budget Note: Currently achieves weak (30-250 μM) binding; consider 10,000+ designs for better results

Custom Design

Advanced mode for complex designs with multiple entities and constraints. Required:
  • Entities: List of proteins and ligands with:
    • Type: “protein” or “ligand”
    • Chain ID
    • Sequence (use numbers for designed regions: “15..20”, letters for fixed: “AAVTT15”)
    • CCD Code or SMILES (for ligands)
Optional: Binding types, secondary structure, cyclic flag, constraints (disulfide bonds), number of designs, batches, budget Use Cases: Disulfide-bonded cyclic peptides, stapled peptides, multi-chain complexes

YAML Configuration

Provide a custom YAML configuration file with associated structure files for advanced control. Required:
  • YAML configuration file
  • Structure file(s) (PDB or CIF format)
  • Protocol selection: Nanobody, Protein, Small Molecule Binder, or Peptide

Common Parameters

Number of Designs:
  • Default: 10
  • Range: 1-100,000
  • For large runs, design batching automatically splits into multiple jobs
Design Batching:
  • Auto-enabled when number of designs > 10
  • Configurable batch size (default: 10 designs per job)
Budget:
  • Final number of designs optimized for diversity and quality
  • Default: 2
Omit Amino Acids:
  • Amino acids to exclude from the design
  • Default: C (cysteine) for peptide and nanobody design, none for others
  • Use “empty” to include all amino acids

Best Practices

Starting Out:
  • Test run: 10-50 designs
  • Production: 10,000-60,000 designs for challenging targets
  • Budget: 2-100 final diverse designs
Target Selection:
  • Binding sites should have ≥3 hydrophobic residues
  • Avoid heavily glycosylated regions
  • Specify binding site when possible
Expected Success Rates:
  • Nanobodies/Proteins: 60-70% (nM affinity) for novel targets
  • Peptides: Lower affinity (μM range) but good hit rates
  • Small molecules: Challenging (μM affinity)
Troubleshooting:
  • No successes: Try different binding sites or longer runs
  • Low expression: Check for hydrophobic patches
  • Avoid lengths 73-76: Known memorization issue (generates ubiquitin)

Pipeline Output

When the pipeline completes, your output directory will contain:

Configuration Files

  • config/ — Configuration files
  • steps.yaml — Pipeline steps configuration

Initial Designs

intermediate_designs/ — Output of design step
  • *.cif — CIF structure files for designed proteins and targets before inverse folding
  • *.npz — Metadata files for designs

Processed Designs

intermediate_designs_inverse_folded/ — Output of inverse folding, folding, and analysis steps
  • *.cif — CIF files after inverse folding (designed residues have backbone atoms only; sidechain coordinates are 0,0,0)
  • *.npz — Metadata files
  • refold_cif/ — Refolded complex structures (target + binder). Primary input for analysis and filtering
  • refold_design_cif/ — Refolded binder structures without target
  • aggregate_metrics_analyze.csv — Aggregated metrics across all designs
  • per_target_metrics_analyze.csv — Metrics per target

Final Results

final_ranked_designs/ — Output of filtering step
  • intermediate_ranked_[N]_designs/ — Top-N quality designs (CIFs copied from refold_cif/)
  • final_[budget]_designs/ — Final quality + diversity set (CIFs copied from refold_cif/)
  • all_designs_metrics.csv — Metrics for all designs considered by filtering
  • final_designs_metrics_[budget].csv — Metrics for selected final set
  • results_overview.pdf — Visualization plots

Key Metrics in CSV Files

Quality Metrics:
  • design_ptm — Predicted TM-score for designed structure (higher = better, >0.75 recommended)
  • design_iptm — Predicted TM-score for design-target interactions (higher = better)
  • filter_rmsd — RMSD to refolded structure (lower = better, <2.5Å recommended)
Interface Metrics:
  • plip_hbonds_refolded — Number of hydrogen bonds between design and target
  • plip_saltbridge_refolded — Number of salt bridge interactions
  • delta_sasa_refolded — Change in solvent accessible surface area (higher = better burial)
Developability Metrics:
  • design_hydrophobicity — Hydrophobicity score of designed residues
  • design_largest_hydrophobic_patch_refolded — Area of largest hydrophobic patch (lower = better)
  • liability_score — Overall developability score (lower = better)
Ranking:
  • final_rank — Final ranking position (1 = best)
  • pass_filters — Binary flag indicating whether design passed all filters

Runtime

Approximate time per design for ~200 residues:
  • ~60 sec (generation)
  • ~5 sec (inverse folding)
  • ~60 sec (structure prediction)
  • ~20 sec total (filtering all designs)
Try BoltzGen