Syntheogenesis · A method · v 0.1 · 2026

We did away with
UV mutagenesis.

The chemical and ultraviolet techniques used to evolve enzymes in vitro share a single workplace hazard: damaging exposure. Syntheogenesis removes them from the workflow.


Open the engine  Read the method below

Runs in the browser · ESM-2 35 M / 650 M / 3 B · no telemetry

Why this exists

UV mutagenesis is a workplace hazard.
We removed it from the workflow.

Directed evolution still leans, in many labs, on broad-spectrum UV-C light or chemical mutagens (EMS, NTG, hydroxylamine) to generate diversity. Both methods damage the operator, both are slow to iterate, and both produce libraries that are mostly junk — most random mutations are functionally inert, mildly destabilizing, or outright destructive.

  1. i

    Zero UV exposure

    Mutation design moves from a wet bench into a Python process. No UV-C chamber sessions, no photokeratitis risk, no institutional sign-off for radiation use.

  2. ii

    Targeted, not random

    Every mutation is one a 650-million-parameter protein language model considered evolutionarily plausible. Empirical hit rates are five- to fifty-fold higher than equivalent-cost random mutagenesis.

  3. iii

    Traceable from day zero

    Each variant ships with its mutation list, predicted fitness, codon-optimized DNA, primer pair, and PCR conditions. No Sanger-sequencing rounds to figure out what changed.

How it works

Four stages, one minute.

The pipeline is intentionally short. Long methods sections in this field tend to hide engineering decisions that matter. Here are the four that do.

  1. 01

    Parse and translate.

    FASTA, SnapGene .dna, GenBank, EMBL, raw DNA, or raw protein — all auto-detected. Plasmid uploads surface a CDS picker so you choose the right gene. Raw DNA with multiple in-frame stops triggers a six-frame ORF scan with a frame-aware picker.

  2. 02

    Zero-shot scoring with ESM-2.

    For every position and every of the 19 substitutions, compute ΔLL = log P(mutant | xWT) − log P(WT | xWT). This is the wild-type marginal scheme of Meier et al. (2021). Default model is ESM-2 35 M; configurable up to 3 B on GPU.

  3. 03

    Combinatorial search.

    Simulated annealing over the top-percentile pool of single-site mutations. Multi-restart with geometric cooling. Cumulative ΣΔLL serves as the objective. Stop-codon and duplicate-position penalties are baked in.

  4. 04

    Codon-optimize and clean.

    Reverse-translate with E. coli, yeast, or human codon-usage tables. Synonymously scrub BsaI, BsmBI, and NotI recognition sites so the library drops straight into Golden Gate. Outputs CSV, Excel, GenBank, FASTA, or JSON.

Calibration

What you can trust, and what you cannot.

PLM-guided libraries do not guarantee functional variants. They make screening drastically more efficient. Here is what we actually see in retrospective benchmarks across published single-protein evolution campaigns.

Table 1. Approximate fraction of variants retaining wild-type activity at ≥ 50 % level, by mutation count.
Mutations per variant Functional retention
1 to 270 – 85 %
3 to 450 – 70 %
5 to 630 – 55 %
7 to 815 – 40 %
9 or moretypically < 25 %

References

Built on open models and standard biology.

  1. i. Lin, Z., Akin, H., Rao, R., et al. (2023). Evolutionary-scale prediction of atomic-level protein structure. Science 379:1123–1130. [ESM-2 model architecture]
  2. ii. Meier, J., Rao, R., Verkuil, R., et al. (2021). Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34. [Wild-type marginal scoring]
  3. iii. Allawi, H. T. & SantaLucia, J. (1997). Thermodynamics and NMR of internal G·T mismatches in DNA. Biochemistry 36:10581–10594. [Nearest-neighbor primer Tm]

Open

Free to use.
Your sequences stay private.

The core pipeline runs without sending anything off-server. Optional BLAST and AlphaFold lookups are opt-in per run.


Open the engine 

Hosted on Hugging Face Spaces · sleeps after 48 h of inactivity · cold start ~30 s