Computational toxicology · ICH M7

Unbury the data.
Unbury the animals.

AI platform for computational toxicology. Starting with ICH M7 Mutagenicity assessment for pharma. Built to replace animal safety testing with software predictions.

The safety data for 100 million compounds already exists, fragmented across ToxCast, ChEMBL, PubChem, Tox21. Assemble it into a working regulatory assessment and the animal test becomes optional. Unbury the data. Unbury the animals.

Animals in US research (estimate)
0M

Counting the mice, rats, fish, and birds not tracked in USDA's ~2M regulated-species statistic. Every Unbury assessment is one the other way.

Impurities per small-molecule drug program
0–200+

Industry quotes on each wet-lab Ames run roughly $5K-$15K and several weeks. Or minutes in software.

ICH M7 allowable intake (lifetime)
1.5 μg/day

The threshold of toxicological concern: a theoretical cancer risk below 1 in 100,000.

Scroll
01The moment

The door that opened in 2022.

For 84 years, US law required drugs to be tested on animals. That changed quietly on December 29, 2022. The ripples are still arriving.

Statuteclick to expand
1938

FDCA mandates animal testing

The Federal Food, Drug, and Cosmetic Act is passed following the 1937 sulfanilamide elixir disaster. A 1962 amendment adds "preclinical tests (including tests on animals)." The requirement stands for 60 years.

Prohibition
2013

EU bans cosmetic animal testing

The EU Cosmetics Regulation prohibits testing finished cosmetics and their ingredients on animals. Full ban in force 11 March 2013. ~44 countries now hold similar bans.

Reformclick to expand
2022

FDA Modernization Act 2.0

S.5002 signed 29 December. Replaces "preclinical tests (including tests on animals)" with "nonclinical tests," explicitly including computer models and organ-chip/cell-based assays.

Reformclick to expand
2023

ICH M7(R2) adds CPCA

ICH M7(R2) formalises the Carcinogenic Potency Categorisation Approach for N-nitrosamine impurities. FDA and EMA set compliance deadlines following the valsartan / ranitidine / metformin recall waves of 2018-22.

Target
2035

EPA target: zero mammalian study requests

Target first set by EPA in 2019, rescinded 2020-21, and restored in January 2026 by Administrator Zeldin. If held, the regulatory transition from animal to computational safety science is substantively complete by 2035.

The regulatory door opened. The software behind it is from the 1990s. Nobody has walked through with modern tools.

The operating premise of this product
02Overview

The whole thing, on one screen.

Four acts. Thirteen chapters. Everything Unbury is, why it matters, how it works, and what's at stake, laid out so you can see it all before you scroll.

Act I01 / 04

Context

Why now is different from every prior moment in US drug safety law.

193820222035
Act II02 / 04

The Science

Computational toxicology from first principles. QSAR, SMILES, structural alerts, statistical models, the ICH M7 classification.

Act III03 / 04

The Tools

The current toxicologist's workflow. The landscape of software that exists. What Unbury does. Where this goes next.

Unbury7 silos · days1 system · minutes
Act IV04 / 04

Depth

The stack behind the predictions. And the fact at the center of the whole thing.

mol = Chem.MolFromSmiles("CC(=O)...")fp = GetMorganFingerprint(mol, 2)pred = model.predict([fp])# → mutagenic? 0 / 1

Not a pitch. An explainer. Each chapter is self-contained — skip around, or read top to bottom.

03The science, from zero

What computational toxicology is.

Predict how a chemical behaves in the body from its structure alone. No wet lab, no animals. The field has existed for decades; it just never had an integrated product built around it.

Structure viewerAspirin
rendering

Acetylsalicylic acid — textbook benign structure.

Rendering is done client-side with open chem tooling. A real Mutagenicityprediction requires the trained model — shown in §04.

Every molecule has a text representation. Every text representation can be parsed, featurized, and fed to a model. That is the whole trick.

A QSAR model is a function: molecule → probability of toxicity. Modern tooling (RDKit, DeepChem, scikit-learn) can run the full pipeline in fewer lines than a typical web backend. The difficulty of the field sits elsewhere: in the data, the validation protocols, the regulatory workflow, and the judgment calls around which predictions to trust.

The vocabulary

QSAR

Quantitative Structure-Activity Relationship

The umbrella concept: predict a biological effect from a chemical's structure alone. All of computational toxicology is a form of QSAR.

SMILES

Simplified Molecular-Input Line-Entry System

Text notation for molecules. `CC(=O)Oc1ccccc1C(=O)O` is aspirin. Think of it as JSON for chemistry.

Fingerprint

Morgan / ECFP / MACCS keys

A binary vector encoding structural features of a molecule. The ML model eats fingerprints, not molecules.

Structural alert

SMARTS-encoded substructure

A known-dangerous fragment. If a molecule contains it, flag it. Regex for chemistry.

Applicability domain

Training distribution

The chemical space the model was trained on. A model trained on drugs can't predict toxicity of pesticides reliably.

Read-across

Neighbor-based inference

Predict a compound's toxicity from similar compounds with known data. k-nearest neighbors for molecules.

04The two-engine architecture

How a prediction actually happens.

ICH M7 doesn't want one answer. It wants two, from complementary methodologies. Here is what each looks like, to the level of detail that matters.

What we're replacing

The Ames test

Developed by Bruce Ames at UC Berkeley in the 1970s. Histidine-dependent strains of Salmonella typhimurium are exposed to the test chemical. Any mutation that restores the bacteria's ability to make histidine produces a visible colony. More colonies, more mutagenicity.

Standard ICH-recognised protocol (OECD Test Guideline 471): four S. typhimurium strains (TA98, TA100, TA1535, TA1537) plus one E. coli strain (WP2 uvrA), each tested with and without rat-liver S9 metabolic activation — ten conditions per compound. 48 to 72 hours of incubation at 37°C. Industry quotes put a full assay at roughly $5K-$15K per compound and several weeks of calendar time.

ICH M7's dual-QSAR gate substitutes software for this assay only when both computational predictions return non-mutagenic. Pharma companies can still choose to skip QSAR and go straight to Ames; the regulation permits three assessment pathways. The software route is the cheaper one when a substantial fraction of the manifest is mutagenicity-safe.

See it for yourself

Which parts of a molecule actually fire an alert.

SMARTS-encoded regex for chemistry. Click a highlighted region of the structure to see the alert that matched and why.
Alert detector · click a coloured regionC6H6N2O3
OHNH2NO2
2-amino-4-nitrophenol
Alert fired

Aromatic primary amine

SMARTS: c-[NH2]

Activates metabolically via N-hydroxylation to form DNA-reactive nitrenium species. Benzidine-class carcinogens share this substructure.

Context

An aromatic intermediate common in dye and pharmaceutical synthesis. Two classical Ames-positive substructures in one ring.

Other alerts on this molecule
Engine A · Rule-based

Structural alerts

Chemistry has a catalogue of fragments known to damage DNA. Encode each as a SMARTS pattern, match against the query molecule, report every hit. Interpretable by design: the output is a list of named danger patterns.

[N;X3](=O)=O
Aromatic nitro group
Reduces to reactive Nitrenium ion; classical Ames+ signal.
N=N
Azo linkage
Cleaves to reactive amines in gut flora.
N-N=O
N-nitroso group
The nitrosamine class. CPCA scrutiny since 2023.
C1OC1
Epoxide
Electrophilic ring strain; reacts with DNA bases.
c-[NH2]
Aromatic primary amine
Activates via N-hydroxylation.

Reference sets: Benigni-Bossa, FDA-published alerts, OECD QSAR Toolbox profilers. Lhasa's Derek Nexus covers 40+ endpoints across 40+ years of curated expert rules.

Engine B · Statistical

Binary classifier on fingerprints

Train a model on thousands of molecules with known Ames outcomes. Given a new molecule, output a probability. The model captures statistical patterns the rule set misses, and vice versa.

01
Input
SMILES string
CC(=O)Oc1ccccc1C(=O)O
02
Featurize
Morgan fingerprint (2048 bits)
[0,1,0,1,0,0,1,1,0,...]
03
Predict
Random forest / XGBoost
P(mutagenic) = 0.04
04
Domain check
Is input in training space?
in_domain = True
05
Output
Binary + confidence
class: non-mutagenic · conf 0.96

On small toxicity benchmarks traditional ML (Random forest, XGBoost on fingerprints) is competitive with graph neural networks; on larger benchmarks GNNs and hybrid ensembles often win. No single architecture dominates. The hard part of the product is not the model.

ICH M7 requires both. If both return non-mutagenic, the impurity isClass 5 and no wet-lab Ames test is needed. If they disagree, anexpert review resolves the conflict.

05The regulation at the center

What ICH M7 is, in plain language.

Harmonised guideline between FDA, EMA, and regulators across 15 other jurisdictions. One of the earliest places in pharmaceutical regulation where a dual software prediction formally substitutes for a lab test; precedent now extends to skin sensitisation, fish acute toxicity, and REACH read-across under OECD defined approaches.

When a pharmaceutical company manufactures a drug, the chemical reactions leave residues: solvents, catalysts, intermediates, degradation products. These are called impurities. A typical drug has 50 to 200 of them.

Some impurities might damage DNA. The regulator requires proof, for every single impurity, that it is either safe at the level present, or controlled below a safe level. No exceptions.

The old way: test each impurity in an Ames assay. Industry quotes run roughly $5,000 to $15,000 per compound, and several weeks of calendar time each. Total bill for a late-stage drug runs into the hundreds of thousands of dollars and months of delay. (Per-compound CRO pricing is not indexed in public sources; figures reflect 2024-2026 industry quotes.)

The ICH M7 way: run both computational engines. If both say non-mutagenic, the impurity is Class 5 and the obligation is satisfied without a wet-lab test. A large fraction of impurities clear at this step; lab work concentrates on the remainder. The fraction depends on the chemistry of the program — no program-level prevalence statistic is published.

One drug · 120 hypothetical impurities
illustrative · fraction varies
IMP-004Class 2 or 3
O=[N+]([O-])c1ccc(N)cc1
Rule-based
alert fired · aromatic nitro + aromatic amine
Statistical
mutagenic · P = 0.78
Acceptable intake
1.5 μg/day (lifetime TTC) until CPCA-specific AI is calculated

Mock outputs — click any other cell to re-classify. The real engine runs SMILES through the same dual pipeline.

Cleared computationally
Flagged for wet-lab
Old way
Hundreds of $K
per drug program, months of delay
ICH M7 way
Minutes
to filter the mutagenicity-safe majority

Every pharma company filing a small-molecule submission has to satisfyICH M7. They can take the dual-QSAR route (software) or go straight to wet-lab Ames testing; most mix the two. The regulation does not mandate software; it gives software a formal path to substitute for the lab test when both engines agree it is safe.

What a program actually spends

Ames testing bill · per drug program

143
1050150300
$10,000
$5K$10K$15K
Based on ~4 weeks per lab campaign at 4-way parallelism. Test plates displaced: 10 ICH-protocol conditions × 3 replicates per compound × QSAR-cleared count. Per-compound range reflects 2024-2026 CRO quotes.
All-wet-lab bill
$1.4M
143 compounds × $10,000
All-wet-lab calendar time
179weeks
4 wk per compound, 4-way parallel
After Unbury's dual-QSAR gate
129
cleared to Class 5
14
flagged for lab
114min
screening time
Lab cost avoided
$1.3M
129 × $10,000 displaced
Bacterial test plates displaced
3,870
129 cleared × 10 conditions × 3 replicates
Why ICH M7 gained a new chapter in 2023

The nitrosamine recalls.

Valsartan was recalled starting 13 July 2018. Then losartan and irbesartan (NMBA / NDEA). Ranitidine (Zantac) began recalls in October 2019, withdrawn entirely April 2020. Metformin ER lots withdrawn in 2020. Varenicline (Chantix) all lots recalled in 2021. The cause in every case: N-nitrosamine impurities above acceptable daily intake.

Nitrosamines were previously considered negligible trace residues. Regulators discovered they could form during ordinary manufacturing or storage. Hundreds of drug products were pulled from pharmacy shelves in the following four years.

ICH M7(R2) · published 2023

CPCA — Carcinogenic Potency Categorisation Approach

A nitrosamine-specific risk framework. Classify the N-nitroso structure against known potency drivers (α-hydrogen count, steric environment,activating substituents) and assign an acceptable intake limit. Every pharma manufacturer now has to run this on every potentially-forming nitrosamine in every drug product.

Recall waves
2018–22
valsartan → Zantac → metformin
AIs within CPCA
18 ng/day
strictest CPCA potency band
Deadline
Active
FDA + EMA compliance ongoing

Unbury handles CPCA as a specialised track within the ICH M7 workflow. Same dual-engine shape, nitrosamine-specific rule set, nitrosamine-specific intake limits, same CTD output.

06The classification system

Every impurity lands in one of five buckets.

ICH M7 specifies the decision rule for each class. The computational prediction determines which branch applies.

Decision rule

Positive Ames + positive rodent carcinogenicity

Regulatory outcome

Control below compound-specific limit

Example

Aflatoxin B1, N-nitrosodimethylamine (NDMA) in well-characterised contexts

Decision rule

Positive Ames, no carcinogenicity data

Regulatory outcome

Control below TTC-based limits

Example

Aromatic primary amines with positive Ames in the literature

Decision rule

Alert detected, no Ames data available

Regulatory outcome

Control at TTC or run the Ames test. If negative → Class 5. If positive → Class 2.

Example

Novel intermediates with nitro, Epoxide, or azo substructures

Decision rule

Alert detected, same alert in the Drug substance itself (which tested negative)

Regulatory outcome

Treat as non-mutagenic

Example

Degradants carrying the same alert as a cleared Active pharmaceutical ingredient

Decision rule

Dual QSAR both negative, or sufficient literature

Regulatory outcome

No further action

Example

The bulk of a typical Impurity manifest after dual-QSAR screening — fraction varies with program chemistry

The TTC staircase

Allowable intake depends on how long the patient takes the drug.

ICH M7's Threshold of Toxicological Concern derives from linear back-extrapolation of rodent TD50 data to a theoretical lifetime cancer risk below 1 in 100,000. The shorter the exposure, the higher theallowable daily intake — because less time means less accumulated risk at the same exposure level.

Intake limit calculator
ICH M7 §7
Treatment duration1.0 years
1 day1 mo1 yr10 yr25 yr
Allowable daily intake · this impurity
20μg/day
> 1 to 12 months

Each limit corresponds to a theoretical lifetime cancer risk below 1 in 100,000 for a 50 kg adult. Values scale for paediatric populations.

07The assessment lifecycle

How an ICH M7 assessment actually gets done.

Before anyone runs a QSAR model, impurities have to be synthesised into existence, caught by analytical chemistry, and characterised into a structure. After the classification, the result has to survive CTD assembly, agency review, and a 20-year lifecycle of process changes. The seven-step workflow below lives inside one phase of a much longer pipeline.

The six phases

Where a single impurity assessment actually lives.

Hover any phase for a sketch. Click to expand. Phase 3 is pre-selected — that's where Unbury works. The rest is what makes this a platform problem, not a tool one.
Phase 03 · computational toxicologist + QSAR specialist

Risk classification

Every impurity moves through one of three pathways defined by ICH M7: (1) dual QSAR — if both rule-based and statistical engines return non-mutagenic, the impurity is Class 5 and no wet-lab test is needed; (2) QSAR + Ames — positive predictions trigger a confirmatory wet-lab test; (3) direct Ames — some programs skip computational assessment entirely.

N-nitrosamines move through the specialised CPCA track added in R2 (2023), which assigns an acceptable intake by potency category rather than via the default TTC.

This is the phase Unbury addresses. The seven-step drill-down is shown below.

Typical duration

hours to days per batch

Handoff to the next phase

ICH M7 class + rationale flows into control strategy + report

Reality check

The three pathways mean this is not a linear process but a branching tree. A single impurity can enter Phase 3 multiple times — first as a computational screen, later for follow-up after a model update or a process change.

Zooming into Phase 03 · IMP-0047

Watch one impurity move through the seven-tool workflow.

A single impurity from a real-sized assessment, played in calendar time. Press play, or toggle the Unbury path to see the same assessment on one surface.

clockDay 1 · 09:066m
1surf
0c/p
0email
1handoff
Step 01

Import structure

MolViewer · desktop
molviewer · shellparse
SMILES CC(=O)Nc1ccc(O)cc1[N+](=O)[O-]
Step 01 · 1/5
SMILES arrives from analytical

A degradation-product Impurity was isolated by MS + NMR. Its proposed structure is written as a SMILES — text notation encoding atoms, bonds, and connectivity. That string is what enters the workflow.

next → Tokenise and parse the SMILES
2D structuredrawing…
NHCOCH₃NOOOH
atoms
rings
Mw
producescanonical SMILES + 2D structure + MwnextRule-based QSARbeat 1 / 5

Derek Nexus is one step. Sarah Nexus is another step. Every existing tool is a step. Unbury is the workflow that binds them — one surface, one Audit trail, one click.

Reality check

What the workflow doesn't solve, even in theory.

Limitations worth naming. Some are product-design decisions. Others are open scientific problems. A good tool doesn't pretend they aren't there.

01

ICH M7 is single-compound only

Every impurity is assessed individually. Mixture toxicology — how impurities interact with each other or with the drug substance — is outside scope and remains an unsolved scientific problem.

02

Parent compound, not metabolites

ICH M7 assesses the impurity as filed. In-vivo metabolism can produce mutagenic fragments not predicted from the parent structure; those fall under other ICH guidelines (Q3C, S1A/B, S2(R1)) on separate workflows.

03

Three pathways, not one

Pharma can skip QSAR entirely and run direct Ames; that is also ICH M7 compliant. Software is a cost-reduction path, not a regulatory mandate.

04

Model versions drift

Derek and Sarah update annually; ChEMBL and Tox21 release refresh corpora. An assessment filed in 2022 may need re-running if a 2026 alert-set update changes a call.

05

Structural ambiguity stalls the whole pipeline

No SMILES, no prediction. Analytical characterisation remains the upstream rate-limiting step, with no amount of QSAR investment able to compensate.

06

International interpretation drift

FDA, EMA, PMDA, Health Canada, and the MHRA each apply ICH M7 with local variations. A filing accepted in one jurisdiction may require supplementary data in another.

08The landscape

Everything that already exists.

Incumbents have regulatory credibility. Academic tools have modern models. Nobody is in the upper-right quadrant. That quadrant is what ICH M7 compliance needs now.

Competitive map
Cloud-native AI ← Architecture → 1990s desktop
Zero ← Regulatory credibility → Gold standard
Modern · no credibility
Modern · credible ← target
Legacy · no credibility
Legacy · credible
Incumbent (Lhasa)
Enterprise
Academic / open
Unbury
Target

Unbury

Built 2026
What it is

Dual prediction engine. LLM-powered expert review. CTD reports. Audit trail. Model version tracking. Nitrosamine CPCA. Cloud-native from day zero.

Pricing

Positioned below Lhasa membership

Honest gap

Earning regulatory track record.

Lhasa ships forty years of curated alerts; we would not build Derek Nexus 2.0 if we could. They are actively migrating products to cloud — Vitic, Mirabilis, Kaptis — so the window on cloud-native ICH M7 workflow is real but finite. Our bet is on what they have not built yet: integrated LLM expert review, shared audit trails, per-assessment pricing, self-serve onboarding. Their not-for-profit membership structure disincentivises undercutting those prices.

09The product

How Unbury works.

One platform for the whole ICH M7 workflow, from SMILES paste to CTD-ready report. Below are three screens that together replace the seven disconnected tools above.

FEATURE / 01

Dual prediction engine

Rule-based and statistical (QSAR) models run in parallel on every compound. Satisfies ICH M7's complementary-methods requirement in a single click. Results side-by-side with the substructures that fired, model confidence, and an applicability-domain flag.

Built on RDKit (structural matching) and scikit-learn / XGBoost (fingerprint-based classifier).

dual prediction · interactive mock
edit SMILES or pick a preset
Engine A · Rule-based
MUTAGENIC
Alerts: aromatic nitro + aromatic primary amine.
Engine B · Statistical
MUTAGENIC
P = 0.87 · in domain
Both engines concur. Control below TTC-based limits. ICH M7 classification: Class 2.

Mock classifier using simple pattern matching on SMILES strings. The real engine uses RDKit SMARTS matching + a Morgan-fingerprint XGBoost model on Ames training data.

FEATURE / 02 · the workflow step LLMs unlock

LLM-powered expert review

When the two engines disagree, incumbents dump it into an email chain. Unbury routes it through an LLM that analyses the Structural alert in context, retrieves analogous compounds from a frozen public corpus, and drafts regulatory-grade reasoning with verified citations. A trained reviewer signs off.

Not frontier research; academic LLM tools for QSAR interpretation already exist (the open-source O-QT Assistant, 2024). What is novel is packaging this as a regulatory-grade, audit-trailed, commercially supported workflow step inside an ICH M7 product.

conflict resolution / IMP-0091
Rule-based
alert fired
Statistical
negative · 0.92
Press play replay to watch the LLM expert-review flow step through: engines disagree → analog retrieval → reasoning draft → recommendation.
The regulator's first question

How does the LLM not hallucinate a citation?

Legitimate objection. The answer has to be structural, not hopeful. Four layers, stacked.

01 · Retrieval, not generation

Analog compounds are retrieved from a frozen ChEMBL + Tox21snapshot via structural similarity search. The LLM can only cite what the retriever actually returned. It never invents a ChEMBL ID.

02 · Citation verification

Every identifier in the draft reasoning is round-tripped against the database before render. Broken or unresolved IDs are surfaced as errors, never silently passed through.

03 · Human signoff required

The LLM drafts. A trained reviewer accepts, edits, or overrides. No prediction exits the system as an LLM-only artefact. The audit trail records the draft, the change, and the signer.

04 · Everything logged, hash-chained

Every prompt, retrieval set, draft, edit, and override is written to an append-only audit table with a hash chain; chain heads commit to external WORM object storage. The reasoning behind any prediction is reproducible end-to-end, satisfying 21 CFR Part 11 §11.10(e).

FEATURE / 03

CTD-formatted reports

Section 3.2.S.3.2 of the Common Technical Document, generated from every assessment. Methods used, results, expert review rationale, classification, intake calculations, control strategy, full audit trail. The format regulators expect in submissions.

Plus model version tracking across the 6–7 year drug development lifecycle — which build of which engine was used for which prediction.

ICH M7 assessment · report preview
# Section 3.2.S.3.2 — Impurities
Drug substance: ABX-4412
Assessment date: 2026-04-16
Methods: Derek-compatible structural alert set v2.3.1; RF QSAR v1.9 (ChEMBL + ToxCast trained).
Impurities assessed: 143
## Summary by class
· Class 5 (no action): 127
· Class 4 (shared alert): 6
· Class 3 (alert, no data): 7
· Class 2 (mutagen, unknown carc.): 3
· Class 1: 0
## Audit trail
Every prediction written to an append-only table with SHA-256 hash chain.
Chain heads committed hourly to S3 Object Lock (WORM) for tamper-evident anchoring.
LLM expert-review drafts logged with reviewer decisions (accept / edit / override).
Engine versions, alert set version, training data manifest captured on every run.
Satisfies 21 CFR Part 11.10(e) and EU Annex 11 §9 tamper-evident requirements.
— End of illustrative sample —

Sample is illustrative. Real CTD output is .docx/.pdf with full audit trail, citations, and regulator-specific formatting.

Also inside

  • Intake limit calculator — 1.5 μg/day lifetime, 10 μg/day (10yr), 20 μg/day (12mo), 120 μg/day (≤1mo)
  • Nitrosamine CPCA classification (ICH M7 R2, 2023)
  • Model version tracking across the drug development lifecycle
  • Per-prediction audit trail, signed and timestamped
  • Batch processing via API for pharma workflow integration
  • Applicability-domain flags on every statistical call

What Unbury is not

  • xNot Derek Nexus 2.0. Derek is one step in a seven-step workflow; Unbury is the workflow.
  • xNot frontier ML. Ames mutagenicity is a binary classifier on molecular fingerprints; traditional ML and modern GNNs trade wins on different benchmarks, and neither dominates. The hard part of the product is not the model.
  • xNot mixture toxicology. ICH M7 assesses one compound at a time. Multi-chemical interaction is a separate open problem.
  • xNot a replacement for the toxicologist. A reviewer signs every expert review. The system accelerates judgment; it does not replace it.
10Who uses this

The industries that run computational toxicology today.

Context, not target list. Anyone producing, importing, or regulating chemicals in a jurisdiction with modern toxicology law has a use for this work.

Pharmaceutical companies

~50%
Primary use

ICH M7 Mutagenicity screening, DILI prediction, hERG cardiotoxicity, ADMET triage

Regulatory driver

FDA, EMA, ICH compliance

One late-stage drug failure from unexpected toxicity costs $500M–$2.6B. Computational screening is cheap insurance.

Chemical manufacturers

15–20%
Primary use

EU REACH compliance, OECD test guidelines, environmental hazard assessment

Regulatory driver

REACH requires toxicity data on ~30,000 chemicals annually

REACH accepts QSAR predictions as weight-of-evidence contributions.

Cosmetics & personal care

10–15%
Primary use

Ingredient safety assessment for formulations

Regulatory driver

EU banned animal testing for cosmetics in 2013. 44 countries with bans.

Fastest-growing segment. Indie / DTC brands launching constantly, each needing safety files.

Food & agriculture

~8–10%
Primary use

Safety assessments for food additives and agrochemicals

Regulatory driver

EFSA, FDA CFSAN, EPA FIFRA

Increasing regulatory pressure on chronic dietary exposure endpoints.

Environmental agencies

~5–8%
Primary use

Chemicals in water, soil, and ecosystems

Regulatory driver

EPA toxicity assessment, state programs

EPA committed to eliminating all mammalian study requests by 2035.

Contract Research Organisations

multiplier
Primary use

Offer computational toxicology as a service to their own clients

Regulatory driver

Downstream pharma demand

One CRO deployment reaches dozens of customers indirectly.

Estimates reflect relative industry spend on computational safety assessment software. Pharma ICH M7 is the initial beachhead; the mechanism generalises across all segments above.

11The longer arc

Why ICH M7 is a wedge, not a product.

The same workflow shape — dual engines, LLM expert review, audit trail, CTD output — applies across every regulated toxicity endpoint and every jurisdiction in active non-animal transition. The wedge is one endpoint in one industry. The platform is the rest.

Arc one · one engine, every endpoint

Same shape, different training data.

Each row below is a separate regulated toxicity endpoint. Each is, at its core, the same product: dual QSAR plus LLM expert review plus CTD-grade output. Different dataset, different alert library, different intake calculation. Same engine.

Wedge

Ames mutagenicity

What it assesses

DNA damage via bacterial reverse mutation assay (the Ames test)

Primary regulator

ICH M7 · FDA · EMA · PMDA

Why it fits the platform

The wedge. Regulation formally substitutes QSAR for wet lab; dual-engine is a mandate.

Arc two · one mechanism, every regulated industry

Every regulator in the non-animal transition.

ICH M7 is the first place software formally substitutes for a wet-lab test. It is not the last. Other regulated industries and jurisdictions are moving through the same transition on staggered timelines.

ICH M72014/2023

Global pharma

Mutagenic impurities in small-molecule drugs

Active · the wedge
ICH Q3 · D · S1 · S5Expanding

Global pharma

Other impurity, carcinogenicity, repro-tox guidelines

Adjacent workflow products
REACH2006 →

EU chemicals

~30,000 chemicals/yr requiring safety data

QSAR accepted as weight-of-evidence
EU Cosmetics Reg.2013 →

EU cosmetics

Finished products + ingredients

Animal testing banned since 2013 · 44 countries
FIFRA · TSCA2035 target

US EPA

Pesticides + industrial chemicals

EPA committed to zero mammal studies by 2035
FDA CFSAN · EFSAEmerging

Food additives / agrochemicals

Dietary exposure safety assessment

Increasing NAM acceptance
Arc three · what compounds

The data asset that grows with every assessment.

A prediction sold is a prediction spent. The asset we are building is not the prediction itself but the by-product of running it: the overrides, the accepted filings, the analog graph.

01

Expert override dataset

Every customer-opt-in assessment produces a labelled data point: input SMILES, dual-engine predictions, structural alerts, LLM reasoning draft, toxicologist override, final regulator-accepted classification. Over three to five years this becomes a real-world corpus of what QSAR got wrong and why. Pharma IP protections mean this asset compounds only where customers agree to contribute — modelled after the Lhasa Vitic consortium, which is its own proof that opt-in sharing works when structured right.

02

Regulatory track record

Every FDA-accepted submission citing Unbury's assessment is a future-sales asset. Credibility in regulated markets compounds non-linearly: the first ten acceptances are hard-won, the next hundred are routine, and competitors cannot shortcut the filing history. Realistic horizon is three to five years of early-customer submissions before this crosses from promise to moat.

03

Proprietary read-across graph

LLM expert review retrieves structural analogs from public corpora on every conflict. The retrieval decisions themselves — which analog best resolves which alert in which context — become a cross-reference graph that does not exist in any single public database. This piece is vendor-controlled; it does not require customer opt-in because it is derived entirely from public source data.

None of these assets exist at T+0. All three compound from the first assessment onward. The value of the platform at year five is not the predictor at year five — it is the five-year-old dataset of everything the predictor got right and wrong.

Where this ends up

The operating system for non-animal safety science.

Unbury at maturity is the platform where every computational safety assessment is produced and filed. One place for every toxicity endpoint, every regulatory framework in active non-animal transition, every industry whose products cross a regulator's desk. The wedge is pharma mutagenicity because that is where the regulation has formally arrived. The rest is a matter of the transition catching up.

Public comparables in the adjacent space — Certara at ~$1.1B cap on $419M revenue, Simulations Plus at ~$300M on $79M, Veeva at ~$26B on $3.2B — are the shape of the category at maturity. None are operating-system-scale yet. That slot is open.

Strategic landscape · integration and exit

Veeva Systems

NYSE: VEEV·~$26B cap · $3.2B rev

Complementary. Veeva Vault RIM is the filing cabinet; Unbury generates the scientific evidence that goes inside the filings. Natural integration partner, plausible long-term acquirer.

Certara

NASDAQ: CERT·~$1.1B cap · $419M rev

Biosimulation giant. Acquired Chemaxon for ~$90M (October 2024). Strategic fit: add Unbury's ICH M7 workflow to their PBPK-led footprint at FDA.

Simulations Plus

NASDAQ: SLP·~$300M cap · $79M rev

Traditional ML ADMET vendor. Market cap compressed in 2025. Architecturally distant from cloud-native; acquisition target if they attempt a rebuild.

Dotmatics / Siemens

Acquired July 2025·$5.1B enterprise value

Dotmatics (IDBS + Prism + SnapGene) acquired by Siemens in July 2025 for $5.1B. Siemens now holds a life-sciences R&D platform; a regulatory-grade safety workflow fits directly.

12Under the hood

The ML is three lines. The product is everything around them.

For engineers curious about the stack: the prediction itself is startlingly plain. The difficulty sits in the data curation, the validation protocols, the regulatory workflow, and the LLM integration.

python · predict.pyv1.9.0
from rdkit import Chem
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier

# the molecule
mol = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")  # aspirin

# featurize to a 2048-bit fingerprint
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)

# predict (model trained on ChEMBL + ToxCast Ames data)
pred = model.predict_proba([fp])[0]
# → [0.96, 0.04]  (non-mutagenic, mutagenic)

# applicability domain check
in_domain = is_within_training_space(fp, train_set, threshold=0.35)

That is the core of the statistical engine. The binary classifier is a few hundred lines. The Structural alert engine is RDKit SMARTS matching against a curated pattern library. Neither requires a foundation model, a GPU farm, or frontier research.

Which is why open Ames classifiers — ADMETlab 3.0, Chemprop on TDC, VEGA QSAR — sit in the same performance band as Sarah Nexus on published benchmarks, free. Sarah's moat is not accuracy. It is regulatory provenance: decades of FDA/EMA submission history, validated training-data audit trails, documented applicability-domain coverage, and model-version discipline that holds up in a review filed today and re-inspected in 2031.

The alert side is the deeper gap. Derek Nexus is four decades of hand-curated structural alerts with case histories and references. Open equivalents (Benigni-Bossa, Toxtree) exist and are usable, but the coverage and annotation quality are not yet at Derek's level. Closing that gap is content work, not ML work.

Unbury's work is the wrapper. Scaffold-split validation; applicability-domain estimation; an extensible Benigni-Bossa-seeded alert library; LLM-driven expert review with citation resolution; CTD-formatted reports; model-version pinning across drug development's 6-to-7-year timeline. A competitive model plus the validation stack that makes it regulator-grade.

ChEMBL 35 compounds
~2.5M
TDC AMES benchmark
7,255

Software concepts, translated

The stack

Backend
  • Python
  • FastAPI
  • Celery
Cheminformatics
  • RDKit
  • DeepChem
ML
  • scikit-learn
  • XGBoost
  • PyTorch (future)
Data
  • ToxCast (~10K chemicals, 700+ assays)
  • ChEMBL 35 (~2.5M)
  • Tox21 (~10K chemicals, ~50M data points)
  • PubChem (119M)
  • DILIrank / LTKB (1,036 drugs)
  • UniTox (2,418 drugs, 8 organs)
  • TDC AMES benchmark (7,255 labelled)
Web
  • Next.js
  • React
  • Tailwind CSS
Platform · MVP
  • Supabase (Postgres + auth)
  • Claude API (expert review)
Platform · GxP production
  • AWS RDS PostgreSQL (dedicated + KMS + VPC)
  • S3 Object Lock (WORM audit anchor)
  • Hash-chained append-only audit schema
  • Validated backup + break-glass procedures
13The fact at the centre

The animal question, stated plainly.

Computational toxicology does not exist because it is cheaper or faster, though it is both. It exists because there is a prior cost nobody wanted to keep paying.

0M

A working estimate. USDA tracks ~2M regulated species annually (dogs, primates, rabbits, and, since 2023, birds and voluntarily-disclosed fish). The ~95% of laboratory animals that are mice, rats, and fish are not tracked in official US statistics; advocacy-group estimates put the full total in the tens of millions. Global 2015 peer-reviewed estimate: ~192 million.

~50M (est.)
animals in US research annually, incl. untracked rodents + fish
192M
animals used in research globally (2015 peer-reviewed estimate)
$5K–$15K
quoted range for a single wet-lab Ames (CRO industry pricing)
Weeks
calendar time for a wet-lab Ames campaign per compound
50–200+
impurities per typical small-molecule drug program
85%
of Americans favour phasing out animal experiments (Morning Consult, 2024)
What an ICH M7 computational assessment prevents

Take an illustrative program with 150 impurities. At industry-quoted $5K-$15K per Ames, the fully-wet-lab bill is hundreds of thousands of dollars and several weeks of calendar time per compound at typical parallelism. Dual-QSAR screens out the Mutagenicity-safe fraction atClass 5, leaving only the remainder for the lab. The fraction depends on program chemistry; the wet-lab delta is the largest line item in the assessment budget either way.

On the training data

The statistical models are trained on historical animal testing data. ToxCast, ChEMBL, Tox21. Those data points came from experiments that are already done. The animals are gone. Using the record of their suffering to keep future animals out of labs is the most redemptive possible use of that data.

On the customers

Unbury's customers are, by definition, companies that currently test on animals. That is why they need Unbury — to stop, or to test fewer, in the places where the regulator now permits computational substitution. The product is abolitionist in its outcome; the commercial path runs through the same companies the outcome is designed to change.

The regulatory framework now permits a software prediction to stand where an animal test used to stand. The question stops being is this possible and becomes what is the product that makes it routine.

Incorporation

Delaware Public Benefit Corporation. Stated purpose: reducing and replacing the use of animals in safety testing through computational methods. The PBC charter legally protects the mission from being compromised by future capital pressure.

Alignment

No tension between profit and mission. Every dollar of revenue corresponds to assessments that replaced an animal test. There is no version of this product that works for the customer and fails the mission.

Scope

ICH M7 Mutagenicity is the first wedge because it is the one place where regulation formally substitutes software for a lab test. The same mechanism generalises: REACH, cosmetics, food additives, agrochemicals, environmental assessment.

14Honest questions

Questions worth asking.

The answers a thoughtful reader is probably already preparing. Directly and without padding.

No. QSAR Mutagenicity prediction has decades of published validation. Published benchmarks on Ames Mutagenicity put balanced accuracy in roughly the 82-92% range on held-out benchmark sets. ICH M7 itself codifies the practice — the regulators, not the vendors, decided it was good enough to substitute for a wet-lab test under the right conditions. What is new here is the workflow product around the science, not the science itself. Predictive performance is bounded by the underlying models, which everyone in the field uses some version of.
They are actively migrating to cloud. Vitic, Mirabilis, and Kaptis are already cloud-deployed; a Derek Nexus web service exists. The unmoved product is the full Derek/Sarah desktop surface and the integrated regulatory workflow on top. The constraints that make this slow are real: decades of desktop customers, a not-for-profit membership pricing model, and an installed base built around on-premise deployment. AddingLLM-driven expert review, shared cross-tenant audit trails, and self-serve per-assessment pricing at the same time is three simultaneous architectural bets. Our bet is a limited window, not a permanent gap — and Lhasa's 40 years of curated alerts is the one piece we would not rebuild even if we could.
FDA Modernization Act 2.0 (December 2022) removed the 1938 animal-testing mandate. ICH M7 R2 (2023) added the CPCA Nitrosamine framework. FDA Mod Act 3.0 passed the US Senate in December 2025, aiming to force regulators to integrate non-animal methods within 12 months of enactment — the bill awaits House action as of early 2026. EPA's OPPT ran its first non-animal cancer evaluations in 2025. The law caught up to the science in a concentrated window. Five years ago the scientific methods were ready; the regulatory permission was not.
No. The work is ML and workflow engineering against public datasets and published regulations. A toxicology advisor provides judgment on edge cases and reviews model output. The Benchling founders did not have biology PhDs before building a $6.1B life-sciences platform; they built software and hired domain experts. The same pattern applies here.
Out of scope for the wedge. ICH M7 assesses each Impurity individually, and the Ames test we are computationally replicating is a single-compound assay. Mixture toxicology is a real, unsolved scientific problem that will matter long-term for cosmetics formulations and environmental assessment. It does not need to be solved to deliver the ICH M7 product.
Published QSAR models on Ames Mutagenicity achieve roughly 85–92% balanced accuracy on held-out benchmark sets, depending on the dataset and validation protocol. Performance varies by chemical class and Applicability domain. ICH M7 explicitly anticipates this: it requires two complementary methodologies specifically so that single-model errors do not propagate, and it mandates expert review when they disagree. The workflow is designed around the uncertainty, not against it.
Every major pharmaceutical company runs ICH M7 assessments — typically with Lhasa's Derek Nexus + Sarah Nexus pair, sometimes with MultiCASE CASE Ultra or Certara / Simulations Plustools. Smaller biotechs, generics, and CROs currently either share enterprise licenses, outsource to consultants, or cobble together academic tools. The self-serve middle of the market is unserved.
It lets you skip the lab test under specific conditions: if two complementary computational predictions both return non-mutagenic on a given Impurity, the impurity is Class 5 and no further testing is required. That language has been in the guideline since 2014 and was reinforced in the 2017 and 2023 revisions. For impurities that conflict or test positive, additional work is still required — often Ames testing. Computational substitution is partial, not total. But for the fraction of impurities that clear both engines — depends on program chemistry; often a large majority — it is.