Syntheogenesis · A method · v 0.1 · 2026
We did away with
UV mutagenesis.
The chemical and ultraviolet techniques used to evolve enzymes in vitro share a single workplace hazard: damaging exposure. Syntheogenesis removes them from the workflow.
Why this exists
UV mutagenesis is a workplace hazard.
We removed it from the workflow.
Directed evolution still leans, in many labs, on broad-spectrum UV-C light or chemical mutagens (EMS, NTG, hydroxylamine) to generate diversity. Both methods damage the operator, both are slow to iterate, and both produce libraries that are mostly junk — most random mutations are functionally inert, mildly destabilizing, or outright destructive.
-
i
Zero UV exposure
Mutation design moves from a wet bench into a Python process. No UV-C chamber sessions, no photokeratitis risk, no institutional sign-off for radiation use.
-
ii
Targeted, not random
Every mutation is one a 650-million-parameter protein language model considered evolutionarily plausible. Empirical hit rates are five- to fifty-fold higher than equivalent-cost random mutagenesis.
-
iii
Traceable from day zero
Each variant ships with its mutation list, predicted fitness, codon-optimized DNA, primer pair, and PCR conditions. No Sanger-sequencing rounds to figure out what changed.
How it works
Four stages, one minute.
The pipeline is intentionally short. Long methods sections in this field tend to hide engineering decisions that matter. Here are the four that do.
-
01
Parse and translate.
FASTA, SnapGene
.dna, GenBank, EMBL, raw DNA, or raw protein — all auto-detected. Plasmid uploads surface a CDS picker so you choose the right gene. Raw DNA with multiple in-frame stops triggers a six-frame ORF scan with a frame-aware picker. -
02
Zero-shot scoring with ESM-2.
For every position and every of the 19 substitutions, compute ΔLL = log P(mutant | xWT) − log P(WT | xWT). This is the wild-type marginal scheme of Meier et al. (2021). Default model is ESM-2 35 M; configurable up to 3 B on GPU.
-
03
Combinatorial search.
Simulated annealing over the top-percentile pool of single-site mutations. Multi-restart with geometric cooling. Cumulative ΣΔLL serves as the objective. Stop-codon and duplicate-position penalties are baked in.
-
04
Codon-optimize and clean.
Reverse-translate with E. coli, yeast, or human codon-usage tables. Synonymously scrub BsaI, BsmBI, and NotI recognition sites so the library drops straight into Golden Gate. Outputs CSV, Excel, GenBank, FASTA, or JSON.
Calibration
What you can trust, and what you cannot.
PLM-guided libraries do not guarantee functional variants. They make screening drastically more efficient. Here is what we actually see in retrospective benchmarks across published single-protein evolution campaigns.
| Mutations per variant | Functional retention |
|---|---|
| 1 to 2 | 70 – 85 % |
| 3 to 4 | 50 – 70 % |
| 5 to 6 | 30 – 55 % |
| 7 to 8 | 15 – 40 % |
| 9 or more | typically < 25 % |
References
Built on open models and standard biology.
- i. Lin, Z., Akin, H., Rao, R., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure. Science 379:1123–1130. [ESM-2 model architecture]
- ii. Meier, J., Rao, R., Verkuil, R., et al. (2021). Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34. [Wild-type marginal scoring]
- iii. Allawi, H. T. & SantaLucia, J. (1997). Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry 36:10581–10594. [Nearest-neighbor primer Tm]
Open
Free to use.
Your sequences stay private.
The core pipeline runs without sending anything off-server. Optional BLAST and AlphaFold lookups are opt-in per run.