Skip to main contentBoltzGen designs proteins, peptides, nanobodies, and other molecules that bind to target structures or small molecules. Achieves 60-70% success rates for nanobody and protein binders against novel targets with only 15 designs tested.
Experimental Validation
Validated across 8 independent wetlab campaigns spanning diverse targets and modalities.
Novel Targets (No Similar Bound Structures in PDB)
Nanobody & Protein Design: 66% success rate (6/9 targets with nM binders) testing ≤15 designs per target
| Target | Nanobody Best Kd | Protein Best Kd |
| PHYH | 7.8 nM | 22 nM |
| PMVK | 6.1 nM | 10 nM |
| RFK | 8.8 nM | - |
| MZB1 | 120 nM | 9.8 nM |
| AMBP | - | 53 nM |
| IDI2 | 520 nM | 26 nM |
| HNMT | 99 nM | - |
| GM2A | - | 270 nM |
Benchmark Targets
80% success rate for both nanobodies and proteins against 5 clinical targets (IL-7Rα, InsulinR, PDGFR, PD-L1, TNFα)
Other Modalities
Bioactive Peptides: nM-μM binders for melittin, indolicidin, protegrin (6 designs per target)
RagC Linear Peptides: 7/29 designs bound (best: 3.5 μM)
RagA:RagC Cyclic Peptides: 14/24 designs bound (best: 80 μM)
NPM1 Disordered Region: 1/5 designs showed nucleolar localization in live cells
Small Molecules: Weak binding (30-250 μM) for rucaparib and rhodamine derivative
GyrA Antimicrobials: 19.5% of 1,808 designs inhibited E. coli growth >4×
Design Tasks
Nanobody Design
Design single-domain antibodies against protein targets.
Required:
- Target PDB file
- Length range (e.g., 110-130)
Optional: Target chains, binding site, number of designs (default: 10), batches (default: 1), budget (default: 2)
Protein Binder Design
Design general protein binders.
Required:
- Target PDB file
- Length range (e.g., 100-150)
Optional: Target chains, binding site, number of designs, batches, budget
Peptide Design
Design linear or cyclic peptides.
Required:
- Target PDB file
- Length range (e.g., 10-20)
Optional: Target chains, binding site, cyclic flag, number of designs, batches, budget
Small Molecule Binder Design
Design proteins that bind small molecules.
Required:
- Target SMILES string
- Length range (e.g., 100-150)
Optional: Number of designs, batches, budget
Note: Currently achieves weak (30-250 μM) binding; consider 10,000+ designs for better results
Custom Design
Advanced mode for complex designs with multiple entities and constraints.
Required:
- Entities: List of proteins and ligands with:
- Type: “protein” or “ligand”
- Chain ID
- Sequence (use numbers for designed regions: “15..20”, letters for fixed: “AAVTT15”)
- CCD Code or SMILES (for ligands)
Optional: Binding types, secondary structure, cyclic flag, constraints (disulfide bonds), number of designs, batches, budget
Use Cases: Disulfide-bonded cyclic peptides, stapled peptides, multi-chain complexes
Best Practices
Starting Out:
- Test run: 10-50 designs
- Production: 10,000-60,000 designs for challenging targets
- Budget: 2-100 final diverse designs
Target Selection:
- Binding sites should have ≥3 hydrophobic residues
- Avoid heavily glycosylated regions
- Specify binding site when possible
Expected Success Rates:
- Nanobodies/Proteins: 60-70% (nM affinity) for novel targets
- Peptides: Lower affinity (μM range) but good hit rates
- Small molecules: Challenging (μM affinity)
Troubleshooting:
- No successes: Try different binding sites or longer runs
- Low expression: Check for hydrophobic patches
- Avoid lengths 73-76: Known memorization issue (generates ubiquitin)
Pipeline Output
When the pipeline completes, your output directory will contain:
Configuration Files
config/ — Configuration files
steps.yaml — Pipeline steps configuration
Initial Designs
intermediate_designs/ — Output of design step
*.cif — CIF structure files for designed proteins and targets before inverse folding
*.npz — Metadata files for designs
Processed Designs
intermediate_designs_inverse_folded/ — Output of inverse folding, folding, and analysis steps
*.cif — CIF files after inverse folding (designed residues have backbone atoms only; sidechain coordinates are 0,0,0)
*.npz — Metadata files
refold_cif/ — Refolded complex structures (target + binder). Primary input for analysis and filtering
refold_design_cif/ — Refolded binder structures without target
aggregate_metrics_analyze.csv — Aggregated metrics across all designs
per_target_metrics_analyze.csv — Metrics per target
Final Results
final_ranked_designs/ — Output of filtering step
intermediate_ranked_[N]_designs/ — Top-N quality designs (CIFs copied from refold_cif/)
final_[budget]_designs/ — Final quality + diversity set (CIFs copied from refold_cif/)
all_designs_metrics.csv — Metrics for all designs considered by filtering
final_designs_metrics_[budget].csv — Metrics for selected final set
results_overview.pdf — Visualization plots
Key Metrics in CSV Files
Quality Metrics:
design_ptm — Predicted TM-score for designed structure (higher = better, >0.75 recommended)
design_iptm — Predicted TM-score for design-target interactions (higher = better)
filter_rmsd — RMSD to refolded structure (lower = better, <2.5Å recommended)
Interface Metrics:
plip_hbonds_refolded — Number of hydrogen bonds between design and target
plip_saltbridge_refolded — Number of salt bridge interactions
delta_sasa_refolded — Change in solvent accessible surface area (higher = better burial)
Developability Metrics:
design_hydrophobicity — Hydrophobicity score of designed residues
design_largest_hydrophobic_patch_refolded — Area of largest hydrophobic patch (lower = better)
liability_score — Overall developability score (lower = better)
Ranking:
final_rank — Final ranking position (1 = best)
pass_filters — Binary flag indicating whether design passed all filters
Runtime
Approximate time per design for ~200 residues:
- ~60 sec (generation)
- ~5 sec (inverse folding)
- ~60 sec (structure prediction)
- ~20 sec total (filtering all designs)
Try BoltzGen