Methodology
Germinal employs a three-stage pipeline that co-optimizes antibody structure and sequence through joint optimization of structure prediction confidence and antibody sequence likelihood.1. Hallucination (Design Stage)
The design stage inverts AlphaFold-Multimer to sample sequences that bind desired epitopes with high predicted confidence. Simultaneously, AbLang2 sequence likelihoods bias sampling toward naturally occurring antibody sequences. This dual-objective optimization navigates a trade-off between structural confidence and antibody naturalness. The hallucination proceeds through three phases:- Logits phase: Initial sequence exploration with gradually increasing language model influence
- Softmax phase: Temperature annealing to refine sequence selection
- Semi-greedy phase: Final sequence commitment with strong language model guidance
- Paratope loss: Ensures binding occurs through CDR regions rather than framework residues
- α-helix loss: Prevents CDRs from forming rigid helical structures
- β-strand loss: Encourages flexible loop conformations characteristic of functional antibodies
2. Sequence Optimization
Sequences passing initial structural filters proceed to AbMPNN (antibody-specific ProteinMPNN), which redesigns CDR residues not in direct contact with the antigen. This improves binder stability while preserving the binding interface.3. Filtering & Validation
Final designs are co-folded using Chai-1 to provide an independent structural assessment. Strict confidence thresholds (ipTM, pLDDT) identify candidates with the highest probability of experimental success.Continuous Generation
Each Germinal job operates as a continuous loop that repeatedly generates and evaluates designs until one accepted design is found. Each iteration:- Generates a new candidate through hallucination
- Applies initial structural filters
- Runs AbMPNN sequence optimization on passing candidates
- Applies final Chai-1 confidence filters
- If the design passes all filters, the job completes; otherwise, it loops back to step 1
failure_counts.csv output file tracks how many trajectories failed at each step, which is useful for understanding the acceptance rate for your specific target.
To generate multiple designs, increase the Number of Designs parameter—this launches additional parallel jobs, each independently looping until it finds one accepted design.
Configuration
Required Settings
| Setting | Type | Description |
|---|---|---|
| Task | Dropdown | VHH (Nanobody) or scFv (Single-Chain Variable Fragment) |
| Target PDB | File | Protein structure of your target (.pdb format) |
| Target Chain | String | Chain ID to design a binder against (e.g., A) |
Design Parameters
| Setting | Type | Default | Description |
|---|---|---|---|
| Hotspot Residues | List | Empty | Target residue numbers for the binder to focus on (e.g., 37,39,41,96,98). Leave empty to allow the model to find optimal binding sites. |
| Framework | Dropdown | Generic | Generic: Use standard nanobody/scFv frameworks from Germinal. Custom: Upload your own framework structure. |
| Number of Designs | Integer | 1 | Number of parallel jobs to launch. Each job loops continuously until it finds one accepted design. |
| Omit Amino Acids | String | C | Amino acids to exclude from CDR designs (comma-separated). Cysteine is omitted by default to prevent unwanted disulfide bridges. |
Custom Framework Settings
These settings appear only when Framework is set to Custom:| Setting | Type | Description |
|---|---|---|
| Framework Structure | File | Your antibody framework PDB. Germinal will only design CDR regions; framework residues remain fixed. The structure does not need to be in a bound pose. |
| Binder Chain | String | Chain ID of your framework to design CDRs for (e.g., A) |
| CDR Lengths | String | Comma-separated CDR region lengths. For VHH: 11,8,18 means HCDR1=11, HCDR2=8, HCDR3=18 residues. For scFv: provide 6 values for heavy then light chains (e.g., 8,8,13,6,6,9). |
| Framework Lengths | String | Comma-separated framework region lengths. For VHH: 25,17,38,14 (HFW1-4). For scFv: 25,17,38,52,17,33,10 (HFW1-4+linker+LFW1-4). |
scFv-Specific Settings (Custom Framework Only)
| Setting | Type | Default | Description |
|---|---|---|---|
| VH First | Boolean | true | Whether the heavy chain appears first in the sequence |
| VH Length | Integer | — | Length of the variable heavy domain (include linker length) |
| VL Length | Integer | — | Length of the variable light domain |
Advanced Settings
| Setting | Type | Description |
|---|---|---|
| Use Rosetta | Boolean | Enable PyRosetta for additional biophysical scoring. Contact [email protected] if you have a license. |
Best Practices
Epitope Selection Strategy
Germinal excels at blocking specific protein-protein interactions (PPIs). For optimal results:- Provide hotspot residues corresponding to known functional interfaces to steer the design toward competitive binders
- Focus on accessible epitopes: Surface-exposed residues with clear structural definition yield higher success rates
- Consider epitope size: 3-8 hotspot residues typically provide sufficient guidance without over-constraining the design
Framework Selection
- Generic frameworks are recommended for most use cases and have been validated across diverse targets
- Custom frameworks enable designs on proprietary scaffolds with favorable developability profiles or humanization characteristics
- When using custom frameworks, ensure accurate CDR and framework length annotations—incorrect values will cause design failures
Understanding Output Metrics
Designs are ranked by structural confidence scores from Chai-1 co-folding:| Metric | Threshold | Interpretation |
|---|---|---|
| ipTM | > 0.7 | High confidence in predicted interface; strong indicator of binding potential |
| pLDDT (binder) | > 0.8 | Well-folded CDR regions with defined structure |
| Interface contacts | Higher is better | More extensive interfaces correlate with tighter binding |
Amino Acid Omission
By default, Cysteine (C) is excluded from CDR designs to prevent:- Unwanted disulfide bridge formation
- Protein aggregation issues
- Oxidation-related instability
C,M to also exclude methionine) based on your expression system or stability requirements.
Experimental Validation
Germinal achieves 4–22% experimental success rates across diverse protein targets, including:| Target | Type | Designs Tested | Binders Found | Best Affinity |
|---|---|---|---|---|
| PD-L1 | Immune checkpoint | 101 | 7 | 170 nM |
| IL3 | Cytokine | 46 | 2 | 560 nM |
| IL20 | Cytokine | 43 | 4 | 190 nM |
| BHRF1 | Viral protein | 52 | 11 | 140 nM |
Output Format
Germinal generates an organized output directory containing all designs and their associated metrics:Key Output Files
| File | Description |
|---|---|
accepted/structures/*.pdb | Final antibody-antigen complex structures for passing designs—these are your top candidates for experimental testing |
accepted/designs.csv | Metrics and sequences for all accepted designs |
all_trajectories.csv | Complete list of all designs with their metrics, pipeline stage reached, and structure file paths |
failure_counts.csv | Diagnostic summary showing where designs failed in the pipeline |
all_trajectories.csv file is particularly useful for understanding design quality across the full run, as it contains in silico metrics for every design that passed the hallucination stage, regardless of whether it was ultimately accepted.
Limitations
- Target size: Memory constraints favor smaller proteins; large targets should be truncated to regions of interest
- Protein epitopes only: Currently limited to protein targets (glycans, small molecules, and nucleic acids are not supported)
- Computational cost: Each design iteration requires structure prediction and backpropagation, making generation computationally intensive