Experimental Validation
Validated across 8 independent wetlab campaigns spanning diverse targets and modalities.Novel Targets (No Similar Bound Structures in PDB)
Nanobody & Protein Design: 66% success rate (6/9 targets with nM binders) testing ≤15 designs per target| Target | Nanobody Best Kd | Protein Best Kd |
|---|---|---|
| PHYH | 7.8 nM | 22 nM |
| PMVK | 6.1 nM | 10 nM |
| RFK | 8.8 nM | - |
| MZB1 | 120 nM | 9.8 nM |
| AMBP | - | 53 nM |
| IDI2 | 520 nM | 26 nM |
| HNMT | 99 nM | - |
| GM2A | - | 270 nM |
Benchmark Targets
80% success rate for both nanobodies and proteins against 5 clinical targets (IL-7Rα, InsulinR, PDGFR, PD-L1, TNFα)Other Modalities
Bioactive Peptides: nM-μM binders for melittin, indolicidin, protegrin (6 designs per target) RagC Linear Peptides: 7/29 designs bound (best: 3.5 μM) RagA:RagC Cyclic Peptides: 14/24 designs bound (best: 80 μM) NPM1 Disordered Region: 1/5 designs showed nucleolar localization in live cells Small Molecules: Weak binding (30-250 μM) for rucaparib and rhodamine derivative GyrA Antimicrobials: 19.5% of 1,808 designs inhibited E. coli growth >4×Design Tasks
Nanobody Design
Design single-domain antibodies against protein targets. Required:- Target PDB file
- Target chains (optional, defaults to all)
- Binding site residues
- Scaffold type: Default (de novo) or Custom (optimize existing framework)
- Framework input type: Structure or Sequence
- For structure input: Framework PDB file, framework chain, CDR regions to design
- For sequence input: Framework sequence with CDR placeholders using number ranges (e.g.,
EVQLVESGGGLVQPGGSLRLSCAASG5..10WVRQAPGKGLEWVS8..12RFTISRDNSKNTLYLQMNSLRAEDTAVYYC10..20WGQGTLVTVSS) - CDR exclude counts: Residues to remove from start of each CDR (e.g., “3,3,7”)
- CDR insertion lengths: Ranges for new residues per CDR (e.g., “1-5,1-5,1-14”)
Antibody Design
Design antibodies with heavy and light chains against protein targets. Required:- Target PDB file
- Target chains (optional, defaults to all)
- Binding site residues
- Scaffold type: Default (de novo) or Custom (optimize existing framework)
- Framework input type: Structure or Sequence
- For structure input: Framework PDB file, heavy chain ID, light chain ID, heavy/light CDR regions
- For sequence input: Heavy chain framework sequence and light chain framework sequence with CDR placeholders
- Heavy/Light CDR exclude counts and insertion length ranges
Protein Binder Design
Design general protein binders. Required:- Target PDB file
- Length range (e.g., 100-150)
Peptide Design
Design linear or cyclic peptides. Required:- Target PDB file
- Length range (e.g., 10-20)
Cyclotide Design
Design cyclic peptides with disulfide bonds. Required:- Target PDB file
- Cyclotide sequence specification with cysteines for disulfide bonds (e.g., “3C8C6C5C3C1C2” means 3 residues, Cys, 8 residues, Cys, etc.)
- Disulfide bond pairs (positions of cysteine pairs to form bonds)
Small Molecule Binder Design
Design proteins that bind small molecules. Required:- Target SMILES string
- Length range (e.g., 100-150)
Custom Design
Advanced mode for complex designs with multiple entities and constraints. Required:- Entities: List of proteins and ligands with:
- Type: “protein” or “ligand”
- Chain ID
- Sequence (use numbers for designed regions: “15..20”, letters for fixed: “AAVTT15”)
- CCD Code or SMILES (for ligands)
YAML Configuration
Provide a custom YAML configuration file with associated structure files for advanced control. Required:- YAML configuration file
- Structure file(s) (PDB or CIF format)
- Protocol selection: Nanobody, Protein, Small Molecule Binder, or Peptide
Common Parameters
Number of Designs:- Default: 10
- Range: 1-100,000
- For large runs, design batching automatically splits into multiple jobs
- Auto-enabled when number of designs > 10
- Configurable batch size (default: 10 designs per job)
- Final number of designs optimized for diversity and quality
- Default: 2
- Amino acids to exclude from the design
- Default: C (cysteine) for peptide and nanobody design, none for others
- Use “empty” to include all amino acids
Best Practices
Starting Out:- Test run: 10-50 designs
- Production: 10,000-60,000 designs for challenging targets
- Budget: 2-100 final diverse designs
- Binding sites should have ≥3 hydrophobic residues
- Avoid heavily glycosylated regions
- Specify binding site when possible
- Nanobodies/Proteins: 60-70% (nM affinity) for novel targets
- Peptides: Lower affinity (μM range) but good hit rates
- Small molecules: Challenging (μM affinity)
- No successes: Try different binding sites or longer runs
- Low expression: Check for hydrophobic patches
- Avoid lengths 73-76: Known memorization issue (generates ubiquitin)
Pipeline Output
When the pipeline completes, your output directory will contain:Configuration Files
config/— Configuration filessteps.yaml— Pipeline steps configuration
Initial Designs
intermediate_designs/ — Output of design step
*.cif— CIF structure files for designed proteins and targets before inverse folding*.npz— Metadata files for designs
Processed Designs
intermediate_designs_inverse_folded/ — Output of inverse folding, folding, and analysis steps
*.cif— CIF files after inverse folding (designed residues have backbone atoms only; sidechain coordinates are 0,0,0)*.npz— Metadata filesrefold_cif/— Refolded complex structures (target + binder). Primary input for analysis and filteringrefold_design_cif/— Refolded binder structures without targetaggregate_metrics_analyze.csv— Aggregated metrics across all designsper_target_metrics_analyze.csv— Metrics per target
Final Results
final_ranked_designs/ — Output of filtering step
intermediate_ranked_[N]_designs/— Top-N quality designs (CIFs copied from refold_cif/)final_[budget]_designs/— Final quality + diversity set (CIFs copied from refold_cif/)all_designs_metrics.csv— Metrics for all designs considered by filteringfinal_designs_metrics_[budget].csv— Metrics for selected final setresults_overview.pdf— Visualization plots
Key Metrics in CSV Files
Quality Metrics:design_ptm— Predicted TM-score for designed structure (higher = better, >0.75 recommended)design_iptm— Predicted TM-score for design-target interactions (higher = better)filter_rmsd— RMSD to refolded structure (lower = better, <2.5Å recommended)
plip_hbonds_refolded— Number of hydrogen bonds between design and targetplip_saltbridge_refolded— Number of salt bridge interactionsdelta_sasa_refolded— Change in solvent accessible surface area (higher = better burial)
design_hydrophobicity— Hydrophobicity score of designed residuesdesign_largest_hydrophobic_patch_refolded— Area of largest hydrophobic patch (lower = better)liability_score— Overall developability score (lower = better)
final_rank— Final ranking position (1 = best)pass_filters— Binary flag indicating whether design passed all filters
Runtime
Approximate time per design for ~200 residues:- ~60 sec (generation)
- ~5 sec (inverse folding)
- ~60 sec (structure prediction)
- ~20 sec total (filtering all designs)