Computational toxicology · ICH M7

Unbury the data.
Unbury the animals.

AI platform for computational toxicology. Starting with ICH M7 Mutagenicity assessment for pharma. Built to replace animal safety testing with software predictions.

The safety data for 100 million compounds already exists, fragmented across ToxCast, ChEMBL, PubChem, Tox21. Assemble it into a working regulatory assessment and the animal test becomes optional. Unbury the data. Unbury the animals.

Request early access Start from the beginning

Animals in US research (estimate)

Counting the mice, rats, fish, and birds not tracked in USDA's ~2M regulated-species statistic. Every Unbury assessment is one the other way.

Impurities per small-molecule drug program

0–200+

Industry quotes on each wet-lab Ames run roughly $5K-$15K and several weeks. Or minutes in software.

ICH M7 allowable intake (lifetime)

1.5 μg/day

The threshold of toxicological concern: a theoretical cancer risk below 1 in 100,000.

Scroll

01The moment

The door that opened in 2022.

For 84 years, US law required drugs to be tested on animals. That changed quietly on December 29, 2022. The ripples are still arriving.

Statuteclick to expand

1938

FDCA mandates animal testing

The Federal Food, Drug, and Cosmetic Act is passed following the 1937 sulfanilamide elixir disaster. A 1962 amendment adds "preclinical tests (including tests on animals)." The requirement stands for 60 years.

Prohibition

2013

EU bans cosmetic animal testing

The EU Cosmetics Regulation prohibits testing finished cosmetics and their ingredients on animals. Full ban in force 11 March 2013. ~44 countries now hold similar bans.

Reformclick to expand

2022

FDA Modernization Act 2.0

S.5002 signed 29 December. Replaces "preclinical tests (including tests on animals)" with "nonclinical tests," explicitly including computer models and organ-chip/cell-based assays.

Reformclick to expand

2023

ICH M7(R2) adds CPCA

ICH M7(R2) formalises the Carcinogenic Potency Categorisation Approach for N-nitrosamine impurities. FDA and EMA set compliance deadlines following the valsartan / ranitidine / metformin recall waves of 2018-22.

Progress

2025

NAMs funding + EPA's first non-animal cancer evaluations

NIH's CARE programme invests $150M in New Approach Methodologies. EPA's OPPT conducts its first non-animal cancer evaluations on di(2-ethylhexyl) phthalate and dibutyl phthalate, sparing ~1,600 animals. FDA Mod Act 3.0 passes the US Senate 16 December; awaits House action.

Progress

2030

Midpoint of the transition

EU Commission working toward phasing out animal testing across all chemical regulations, following the Save Cruelty-Free Cosmetics European Citizens' Initiative (1,413,383 signatures collected, 1,217,916 verified).

Target

2035

EPA target: zero mammalian study requests

Target first set by EPA in 2019, rescinded 2020-21, and restored in January 2026 by Administrator Zeldin. If held, the regulatory transition from animal to computational safety science is substantively complete by 2035.

The regulatory door opened. The software behind it is from the 1990s. Nobody has walked through with modern tools.
The operating premise of this product

02Overview

The whole thing, on one screen.

Four acts. Thirteen chapters. Everything Unbury is, why it matters, how it works, and what's at stake, laid out so you can see it all before you scroll.

Act I01 / 04

Context

Why now is different from every prior moment in US drug safety law.

§01The Moment→
§02This overview→

Act II02 / 04

The Science

Computational toxicology from first principles. QSAR, SMILES, structural alerts, statistical models, the ICH M7 classification.

§03What computational toxicology is→
§04How mutagenicity prediction works→
§05What ICH M7 is→
§06The classification system→

Act III03 / 04

The Tools

The current toxicologist's workflow. The landscape of software that exists. What Unbury does. Where this goes next.

§07The current workflow→
§08The current tool landscape→
§09How Unbury works→
§10Who uses computational toxicology→
§11The longer arc→

Act IV04 / 04

Depth

The stack behind the predictions. And the fact at the center of the whole thing.

§12Under the hood→
§13The animal question→

Not a pitch. An explainer. Each chapter is self-contained — skip around, or read top to bottom.

03The science, from zero

What computational toxicology is.

Predict how a chemical behaves in the body from its structure alone. No wet lab, no animals. The field has existed for decades; it just never had an integrated product built around it.

Structure viewerAspirin

rendering

Acetylsalicylic acid — textbook benign structure.

Paste any SMILES

Rendering is done client-side with open chem tooling. A real Mutagenicityprediction requires the trained model — shown in §04.

Every molecule has a text representation. Every text representation can be parsed, featurized, and fed to a model. That is the whole trick.

A QSAR model is a function: molecule → probability of toxicity. Modern tooling (RDKit, DeepChem, scikit-learn) can run the full pipeline in fewer lines than a typical web backend. The difficulty of the field sits elsewhere: in the data, the validation protocols, the regulatory workflow, and the judgment calls around which predictions to trust.

The vocabulary

QSAR

Quantitative Structure-Activity Relationship

The umbrella concept: predict a biological effect from a chemical's structure alone. All of computational toxicology is a form of QSAR.

SMILES

Simplified Molecular-Input Line-Entry System

Text notation for molecules. `CC(=O)Oc1ccccc1C(=O)O` is aspirin. Think of it as JSON for chemistry.

Fingerprint

Morgan / ECFP / MACCS keys

A binary vector encoding structural features of a molecule. The ML model eats fingerprints, not molecules.

Structural alert

SMARTS-encoded substructure

A known-dangerous fragment. If a molecule contains it, flag it. Regex for chemistry.

Applicability domain

Training distribution

The chemical space the model was trained on. A model trained on drugs can't predict toxicity of pesticides reliably.

Read-across

Neighbor-based inference

Predict a compound's toxicity from similar compounds with known data. k-nearest neighbors for molecules.

04The two-engine architecture

How a prediction actually happens.

ICH M7 doesn't want one answer. It wants two, from complementary methodologies. Here is what each looks like, to the level of detail that matters.

What we're replacing

The Ames test

Developed by Bruce Ames at UC Berkeley in the 1970s. Histidine-dependent strains of Salmonella typhimurium are exposed to the test chemical. Any mutation that restores the bacteria's ability to make histidine produces a visible colony. More colonies, more mutagenicity.

Standard ICH-recognised protocol (OECD Test Guideline 471): four S. typhimurium strains (TA98, TA100, TA1535, TA1537) plus one E. coli strain (WP2 uvrA), each tested with and without rat-liver S9 metabolic activation — ten conditions per compound. 48 to 72 hours of incubation at 37°C. Industry quotes put a full assay at roughly $5K-$15K per compound and several weeks of calendar time.

ICH M7's dual-QSAR gate substitutes software for this assay only when both computational predictions return non-mutagenic. Pharma companies can still choose to skip QSAR and go straight to Ames; the regulation permits three assessment pathways. The software route is the cheaper one when a substantial fraction of the manifest is mutagenicity-safe.

See it for yourself

Which parts of a molecule actually fire an alert.

SMARTS-encoded regex for chemistry. Click a highlighted region of the structure to see the alert that matched and why.

Alert detector · click a coloured regionC6H6N2O3

2-amino-4-nitrophenol

Alert fired

Aromatic primary amine

SMARTS: c-[NH2]

Activates metabolically via N-hydroxylation to form DNA-reactive nitrenium species. Benzidine-class carcinogens share this substructure.

Context

An aromatic intermediate common in dye and pharmaceutical synthesis. Two classical Ames-positive substructures in one ring.

Other alerts on this molecule

Engine A · Rule-based

Structural alerts

Chemistry has a catalogue of fragments known to damage DNA. Encode each as a SMARTS pattern, match against the query molecule, report every hit. Interpretable by design: the output is a list of named danger patterns.

[N;X3](=O)=O

Aromatic nitro group

Reduces to reactive Nitrenium ion; classical Ames+ signal.

N=N

Azo linkage

Cleaves to reactive amines in gut flora.

N-N=O

N-nitroso group

The nitrosamine class. CPCA scrutiny since 2023.

C1OC1

Epoxide

Electrophilic ring strain; reacts with DNA bases.

c-[NH2]

Aromatic primary amine

Activates via N-hydroxylation.

Reference sets: Benigni-Bossa, FDA-published alerts, OECD QSAR Toolbox profilers. Lhasa's Derek Nexus covers 40+ endpoints across 40+ years of curated expert rules.

Engine B · Statistical

Binary classifier on fingerprints

Train a model on thousands of molecules with known Ames outcomes. Given a new molecule, output a probability. The model captures statistical patterns the rule set misses, and vice versa.

Input

SMILES string

CC(=O)Oc1ccccc1C(=O)O

Featurize

Morgan fingerprint (2048 bits)

[0,1,0,1,0,0,1,1,0,...]

Predict

Random forest / XGBoost

P(mutagenic) = 0.04

Domain check

Is input in training space?

in_domain = True

Output

Binary + confidence

class: non-mutagenic · conf 0.96

On small toxicity benchmarks traditional ML (Random forest, XGBoost on fingerprints) is competitive with graph neural networks; on larger benchmarks GNNs and hybrid ensembles often win. No single architecture dominates. The hard part of the product is not the model.

ICH M7 requires both. If both return non-mutagenic, the impurity isClass 5 and no wet-lab Ames test is needed. If they disagree, anexpert review resolves the conflict.

05The regulation at the center

What ICH M7 is, in plain language.

Harmonised guideline between FDA, EMA, and regulators across 15 other jurisdictions. One of the earliest places in pharmaceutical regulation where a dual software prediction formally substitutes for a lab test; precedent now extends to skin sensitisation, fish acute toxicity, and REACH read-across under OECD defined approaches.

When a pharmaceutical company manufactures a drug, the chemical reactions leave residues: solvents, catalysts, intermediates, degradation products. These are called impurities. A typical drug has 50 to 200 of them.

Some impurities might damage DNA. The regulator requires proof, for every single impurity, that it is either safe at the level present, or controlled below a safe level. No exceptions.

The old way: test each impurity in an Ames assay. Industry quotes run roughly $5,000 to $15,000 per compound, and several weeks of calendar time each. Total bill for a late-stage drug runs into the hundreds of thousands of dollars and months of delay. (Per-compound CRO pricing is not indexed in public sources; figures reflect 2024-2026 industry quotes.)

The ICH M7 way: run both computational engines. If both say non-mutagenic, the impurity is Class 5 and the obligation is satisfied without a wet-lab test. A large fraction of impurities clear at this step; lab work concentrates on the remainder. The fraction depends on the chemistry of the program — no program-level prevalence statistic is published.

One drug · 120 hypothetical impurities

illustrative · fraction varies

IMP-004Class 2 or 3

O=[N+]([O-])c1ccc(N)cc1

Rule-based

alert fired · aromatic nitro + aromatic amine

Statistical

mutagenic · P = 0.78

Acceptable intake

1.5 μg/day (lifetime TTC) until CPCA-specific AI is calculated

Mock outputs — click any other cell to re-classify. The real engine runs SMILES through the same dual pipeline.

Cleared computationally

Flagged for wet-lab

Old way

Hundreds of $K

per drug program, months of delay

ICH M7 way

Minutes

to filter the mutagenicity-safe majority

Every pharma company filing a small-molecule submission has to satisfyICH M7. They can take the dual-QSAR route (software) or go straight to wet-lab Ames testing; most mix the two. The regulation does not mandate software; it gives software a formal path to substitute for the lab test when both engines agree it is safe.

What a program actually spends

Ames testing bill · per drug program

ICH M7 filter · indicative only

Impurities in the program143

1050150300

Wet-lab Ames cost per compound$10,000

$5K$10K$15K

Based on ~4 weeks per lab campaign at 4-way parallelism. Test plates displaced: 10 ICH-protocol conditions × 3 replicates per compound × QSAR-cleared count. Per-compound range reflects 2024-2026 CRO quotes.

All-wet-lab bill

$1.4M

143 compounds × $10,000

All-wet-lab calendar time

179weeks

4 wk per compound, 4-way parallel

After Unbury's dual-QSAR gate

129

cleared to Class 5

flagged for lab

114min

screening time

Lab cost avoided

$1.3M

129 × $10,000 displaced

Bacterial test plates displaced

3,870

129 cleared × 10 conditions × 3 replicates

Why ICH M7 gained a new chapter in 2023

The nitrosamine recalls.

Valsartan was recalled starting 13 July 2018. Then losartan and irbesartan (NMBA / NDEA). Ranitidine (Zantac) began recalls in October 2019, withdrawn entirely April 2020. Metformin ER lots withdrawn in 2020. Varenicline (Chantix) all lots recalled in 2021. The cause in every case: N-nitrosamine impurities above acceptable daily intake.

Nitrosamines were previously considered negligible trace residues. Regulators discovered they could form during ordinary manufacturing or storage. Hundreds of drug products were pulled from pharmacy shelves in the following four years.

ICH M7(R2) · published 2023

CPCA — Carcinogenic Potency Categorisation Approach

A nitrosamine-specific risk framework. Classify the N-nitroso structure against known potency drivers (α-hydrogen count, steric environment,activating substituents) and assign an acceptable intake limit. Every pharma manufacturer now has to run this on every potentially-forming nitrosamine in every drug product.

Recall waves

2018–22

valsartan → Zantac → metformin

AIs within CPCA

18 ng/day

strictest CPCA potency band

Deadline

Active

FDA + EMA compliance ongoing

Unbury handles CPCA as a specialised track within the ICH M7 workflow. Same dual-engine shape, nitrosamine-specific rule set, nitrosamine-specific intake limits, same CTD output.

06The classification system

Every impurity lands in one of five buckets.

ICH M7 specifies the decision rule for each class. The computational prediction determines which branch applies.

Decision rule

Positive Ames + positive rodent carcinogenicity

Regulatory outcome

Control below compound-specific limit

Example

Aflatoxin B1, N-nitrosodimethylamine (NDMA) in well-characterised contexts

Decision rule

Positive Ames, no carcinogenicity data

Regulatory outcome

Control below TTC-based limits

Example

Aromatic primary amines with positive Ames in the literature

Decision rule

Alert detected, no Ames data available

Regulatory outcome

Control at TTC or run the Ames test. If negative → Class 5. If positive → Class 2.

Example

Novel intermediates with nitro, Epoxide, or azo substructures

Decision rule

Alert detected, same alert in the Drug substance itself (which tested negative)

Regulatory outcome

Treat as non-mutagenic

Example

Degradants carrying the same alert as a cleared Active pharmaceutical ingredient

Decision rule

Dual QSAR both negative, or sufficient literature

Regulatory outcome

No further action

Example

The bulk of a typical Impurity manifest after dual-QSAR screening — fraction varies with program chemistry

The TTC staircase

Allowable intake depends on how long the patient takes the drug.

ICH M7's Threshold of Toxicological Concern derives from linear back-extrapolation of rodent TD50 data to a theoretical lifetime cancer risk below 1 in 100,000. The shorter the exposure, the higher theallowable daily intake — because less time means less accumulated risk at the same exposure level.

Intake limit calculator

ICH M7 §7

Treatment duration1.0 years

1 day1 mo1 yr10 yr25 yr

Allowable daily intake · this impurity

20μg/day

> 1 to 12 months

Each limit corresponds to a theoretical lifetime cancer risk below 1 in 100,000 for a 50 kg adult. Values scale for paediatric populations.

07The assessment lifecycle

How an ICH M7 assessment actually gets done.

Before anyone runs a QSAR model, impurities have to be synthesised into existence, caught by analytical chemistry, and characterised into a structure. After the classification, the result has to survive CTD assembly, agency review, and a 20-year lifecycle of process changes. The seven-step workflow below lives inside one phase of a much longer pipeline.

The six phases

Where a single impurity assessment actually lives.

Hover any phase for a sketch. Click to expand. Phase 3 is pre-selected — that's where Unbury works. The rest is what makes this a platform problem, not a tool one.

Phase 03 · computational toxicologist + QSAR specialist

Risk classification

Every impurity moves through one of three pathways defined by ICH M7: (1) dual QSAR — if both rule-based and statistical engines return non-mutagenic, the impurity is Class 5 and no wet-lab test is needed; (2) QSAR + Ames — positive predictions trigger a confirmatory wet-lab test; (3) direct Ames — some programs skip computational assessment entirely.

N-nitrosamines move through the specialised CPCA track added in R2 (2023), which assigns an acceptable intake by potency category rather than via the default TTC.

This is the phase Unbury addresses. The seven-step drill-down is shown below.

Typical duration

hours to days per batch

Handoff to the next phase

ICH M7 class + rationale flows into control strategy + report

Reality check

The three pathways mean this is not a linear process but a branching tree. A single impurity can enter Phase 3 multiple times — first as a computational screen, later for follow-up after a model update or a process change.

Zooming into Phase 03 · IMP-0047

Watch one impurity move through the seven-tool workflow.

A single impurity from a real-sized assessment, played in calendar time. Press play, or toggle the Unbury path to see the same assessment on one surface.

clockDay 1 · 09:066m

1surf

0c/p

0email

1handoff

Step 01

Import structure

MolViewer · desktop

molviewer · shellparse

SMILES CC(=O)Nc1ccc(O)cc1[N+](=O)[O-]

Step 01 · 1/5

SMILES arrives from analytical

A degradation-product Impurity was isolated by MS + NMR. Its proposed structure is written as a SMILES — text notation encoding atoms, bonds, and connectivity. That string is what enters the workflow.

next → Tokenise and parse the SMILES

2D structuredrawing…

atoms

—

rings

—

producescanonical SMILES + 2D structure + Mw→nextRule-based QSARbeat 1 / 5

Derek Nexus is one step. Sarah Nexus is another step. Every existing tool is a step. Unbury is the workflow that binds them — one surface, one Audit trail, one click.

Reality check

What the workflow doesn't solve, even in theory.

Limitations worth naming. Some are product-design decisions. Others are open scientific problems. A good tool doesn't pretend they aren't there.

ICH M7 is single-compound only

Every impurity is assessed individually. Mixture toxicology — how impurities interact with each other or with the drug substance — is outside scope and remains an unsolved scientific problem.

Parent compound, not metabolites

ICH M7 assesses the impurity as filed. In-vivo metabolism can produce mutagenic fragments not predicted from the parent structure; those fall under other ICH guidelines (Q3C, S1A/B, S2(R1)) on separate workflows.

Three pathways, not one

Pharma can skip QSAR entirely and run direct Ames; that is also ICH M7 compliant. Software is a cost-reduction path, not a regulatory mandate.

Model versions drift

Derek and Sarah update annually; ChEMBL and Tox21 release refresh corpora. An assessment filed in 2022 may need re-running if a 2026 alert-set update changes a call.

Structural ambiguity stalls the whole pipeline

No SMILES, no prediction. Analytical characterisation remains the upstream rate-limiting step, with no amount of QSAR investment able to compensate.

International interpretation drift

FDA, EMA, PMDA, Health Canada, and the MHRA each apply ICH M7 with local variations. A filing accepted in one jurisdiction may require supplementary data in another.

08The landscape

Everything that already exists.

Incumbents have regulatory credibility. Academic tools have modern models. Nobody is in the upper-right quadrant. That quadrant is what ICH M7 compliance needs now.

Competitive map

Cloud-native AI ← Architecture → 1990s desktop

Zero ← Regulatory credibility → Gold standard

Modern · no credibility

Modern · credible ← target

Legacy · no credibility

Legacy · credible

Incumbent (Lhasa)

Enterprise

Academic / open

Unbury

Target

Unbury

Built 2026

What it is

Dual prediction engine. LLM-powered expert review. CTD reports. Audit trail. Model version tracking. Nitrosamine CPCA. Cloud-native from day zero.

Pricing

Positioned below Lhasa membership

Honest gap

Earning regulatory track record.

Lhasa ships forty years of curated alerts; we would not build Derek Nexus 2.0 if we could. They are actively migrating products to cloud — Vitic, Mirabilis, Kaptis — so the window on cloud-native ICH M7 workflow is real but finite. Our bet is on what they have not built yet: integrated LLM expert review, shared audit trails, per-assessment pricing, self-serve onboarding. Their not-for-profit membership structure disincentivises undercutting those prices.

09The product

How Unbury works.

One platform for the whole ICH M7 workflow, from SMILES paste to CTD-ready report. Below are three screens that together replace the seven disconnected tools above.

FEATURE / 01

Dual prediction engine

Rule-based and statistical (QSAR) models run in parallel on every compound. Satisfies ICH M7's complementary-methods requirement in a single click. Results side-by-side with the substructures that fired, model confidence, and an applicability-domain flag.

Built on RDKit (structural matching) and scikit-learn / XGBoost (fingerprint-based classifier).

dual prediction · interactive mock

edit SMILES or pick a preset

Engine A · Rule-based

MUTAGENIC

Alerts: aromatic nitro + aromatic primary amine.

Engine B · Statistical

MUTAGENIC

P = 0.87 · in domain

→Both engines concur. Control below TTC-based limits. ICH M7 classification: Class 2.

Mock classifier using simple pattern matching on SMILES strings. The real engine uses RDKit SMARTS matching + a Morgan-fingerprint XGBoost model on Ames training data.

FEATURE / 02 · the workflow step LLMs unlock

LLM-powered expert review

When the two engines disagree, incumbents dump it into an email chain. Unbury routes it through an LLM that analyses the Structural alert in context, retrieves analogous compounds from a frozen public corpus, and drafts regulatory-grade reasoning with verified citations. A trained reviewer signs off.

Not frontier research; academic LLM tools for QSAR interpretation already exist (the open-source O-QT Assistant, 2024). What is novel is packaging this as a regulatory-grade, audit-trailed, commercially supported workflow step inside an ICH M7 product.

conflict resolution / IMP-0091

Rule-based

alert fired

Statistical

negative · 0.92

Press play replay to watch the LLM expert-review flow step through: engines disagree → analog retrieval → reasoning draft → recommendation.

The regulator's first question

How does the LLM not hallucinate a citation?

Legitimate objection. The answer has to be structural, not hopeful. Four layers, stacked.

01 · Retrieval, not generation

Analog compounds are retrieved from a frozen ChEMBL + Tox21snapshot via structural similarity search. The LLM can only cite what the retriever actually returned. It never invents a ChEMBL ID.

02 · Citation verification

Every identifier in the draft reasoning is round-tripped against the database before render. Broken or unresolved IDs are surfaced as errors, never silently passed through.

03 · Human signoff required

The LLM drafts. A trained reviewer accepts, edits, or overrides. No prediction exits the system as an LLM-only artefact. The audit trail records the draft, the change, and the signer.

04 · Everything logged, hash-chained

Every prompt, retrieval set, draft, edit, and override is written to an append-only audit table with a hash chain; chain heads commit to external WORM object storage. The reasoning behind any prediction is reproducible end-to-end, satisfying 21 CFR Part 11 §11.10(e).

FEATURE / 03

CTD-formatted reports

Section 3.2.S.3.2 of the Common Technical Document, generated from every assessment. Methods used, results, expert review rationale, classification, intake calculations, control strategy, full audit trail. The format regulators expect in submissions.

Plus model version tracking across the 6–7 year drug development lifecycle — which build of which engine was used for which prediction.

ICH M7 assessment · report preview

# Section 3.2.S.3.2 — Impurities

Drug substance: ABX-4412

Assessment date: 2026-04-16

Methods: Derek-compatible structural alert set v2.3.1; RF QSAR v1.9 (ChEMBL + ToxCast trained).

Impurities assessed: 143

## Summary by class

· Class 5 (no action): 127

· Class 4 (shared alert): 6

· Class 3 (alert, no data): 7

· Class 2 (mutagen, unknown carc.): 3

· Class 1: 0

## Audit trail

Every prediction written to an append-only table with SHA-256 hash chain.

Chain heads committed hourly to S3 Object Lock (WORM) for tamper-evident anchoring.

LLM expert-review drafts logged with reviewer decisions (accept / edit / override).

Engine versions, alert set version, training data manifest captured on every run.

Satisfies 21 CFR Part 11.10(e) and EU Annex 11 §9 tamper-evident requirements.

— End of illustrative sample —

Sample is illustrative. Real CTD output is .docx/.pdf with full audit trail, citations, and regulator-specific formatting.

Also inside

Intake limit calculator — 1.5 μg/day lifetime, 10 μg/day (10yr), 20 μg/day (12mo), 120 μg/day (≤1mo)
Nitrosamine CPCA classification (ICH M7 R2, 2023)
Model version tracking across the drug development lifecycle
Per-prediction audit trail, signed and timestamped
Batch processing via API for pharma workflow integration
Applicability-domain flags on every statistical call

What Unbury is not

xNot Derek Nexus 2.0. Derek is one step in a seven-step workflow; Unbury is the workflow.
xNot frontier ML. Ames mutagenicity is a binary classifier on molecular fingerprints; traditional ML and modern GNNs trade wins on different benchmarks, and neither dominates. The hard part of the product is not the model.
xNot mixture toxicology. ICH M7 assesses one compound at a time. Multi-chemical interaction is a separate open problem.
xNot a replacement for the toxicologist. A reviewer signs every expert review. The system accelerates judgment; it does not replace it.

Regulatory driver

Downstream pharma demand

One CRO deployment reaches dozens of customers indirectly.

Estimates reflect relative industry spend on computational safety assessment software. Pharma ICH M7 is the initial beachhead; the mechanism generalises across all segments above.

11The longer arc

Why ICH M7 is a wedge, not a product.

The same workflow shape — dual engines, LLM expert review, audit trail, CTD output — applies across every regulated toxicity endpoint and every jurisdiction in active non-animal transition. The wedge is one endpoint in one industry. The platform is the rest.

Arc one · one engine, every endpoint

Same shape, different training data.

Each row below is a separate regulated toxicity endpoint. Each is, at its core, the same product: dual QSAR plus LLM expert review plus CTD-grade output. Different dataset, different alert library, different intake calculation. Same engine.

Wedge

Ames mutagenicity

What it assesses

DNA damage via bacterial reverse mutation assay (the Ames test)

Primary regulator

ICH M7 · FDA · EMA · PMDA

Why it fits the platform

The wedge. Regulation formally substitutes QSAR for wet lab; dual-engine is a mandate.

Arc two · one mechanism, every regulated industry

Every regulator in the non-animal transition.

ICH M7 is the first place software formally substitutes for a wet-lab test. It is not the last. Other regulated industries and jurisdictions are moving through the same transition on staggered timelines.

ICH M72014/2023

Global pharma

Mutagenic impurities in small-molecule drugs

Active · the wedge

ICH Q3 · D · S1 · S5Expanding

Global pharma

Other impurity, carcinogenicity, repro-tox guidelines

Adjacent workflow products

REACH2006 →

EU chemicals

~30,000 chemicals/yr requiring safety data

QSAR accepted as weight-of-evidence

EU Cosmetics Reg.2013 →

EU cosmetics

Finished products + ingredients

Animal testing banned since 2013 · 44 countries

FIFRA · TSCA2035 target

US EPA

Pesticides + industrial chemicals

EPA committed to zero mammal studies by 2035

FDA CFSAN · EFSAEmerging

Food additives / agrochemicals

Dietary exposure safety assessment

Increasing NAM acceptance

Arc three · what compounds

The data asset that grows with every assessment.

A prediction sold is a prediction spent. The asset we are building is not the prediction itself but the by-product of running it: the overrides, the accepted filings, the analog graph.

Expert override dataset

Every customer-opt-in assessment produces a labelled data point: input SMILES, dual-engine predictions, structural alerts, LLM reasoning draft, toxicologist override, final regulator-accepted classification. Over three to five years this becomes a real-world corpus of what QSAR got wrong and why. Pharma IP protections mean this asset compounds only where customers agree to contribute — modelled after the Lhasa Vitic consortium, which is its own proof that opt-in sharing works when structured right.

Regulatory track record

Every FDA-accepted submission citing Unbury's assessment is a future-sales asset. Credibility in regulated markets compounds non-linearly: the first ten acceptances are hard-won, the next hundred are routine, and competitors cannot shortcut the filing history. Realistic horizon is three to five years of early-customer submissions before this crosses from promise to moat.

Proprietary read-across graph

LLM expert review retrieves structural analogs from public corpora on every conflict. The retrieval decisions themselves — which analog best resolves which alert in which context — become a cross-reference graph that does not exist in any single public database. This piece is vendor-controlled; it does not require customer opt-in because it is derived entirely from public source data.

None of these assets exist at T+0. All three compound from the first assessment onward. The value of the platform at year five is not the predictor at year five — it is the five-year-old dataset of everything the predictor got right and wrong.

Where this ends up

The operating system for non-animal safety science.

Unbury at maturity is the platform where every computational safety assessment is produced and filed. One place for every toxicity endpoint, every regulatory framework in active non-animal transition, every industry whose products cross a regulator's desk. The wedge is pharma mutagenicity because that is where the regulation has formally arrived. The rest is a matter of the transition catching up.

Public comparables in the adjacent space — Certara at ~$1.1B cap on $419M revenue, Simulations Plus at ~$300M on $79M, Veeva at ~$26B on $3.2B — are the shape of the category at maturity. None are operating-system-scale yet. That slot is open.

Strategic landscape · integration and exit

Veeva Systems

NYSE: VEEV·~$26B cap · $3.2B rev

Complementary. Veeva Vault RIM is the filing cabinet; Unbury generates the scientific evidence that goes inside the filings. Natural integration partner, plausible long-term acquirer.

Certara

NASDAQ: CERT·~$1.1B cap · $419M rev

Biosimulation giant. Acquired Chemaxon for ~$90M (October 2024). Strategic fit: add Unbury's ICH M7 workflow to their PBPK-led footprint at FDA.

Simulations Plus

NASDAQ: SLP·~$300M cap · $79M rev

Traditional ML ADMET vendor. Market cap compressed in 2025. Architecturally distant from cloud-native; acquisition target if they attempt a rebuild.

Dotmatics / Siemens

Acquired July 2025·$5.1B enterprise value

Dotmatics (IDBS + Prism + SnapGene) acquired by Siemens in July 2025 for $5.1B. Siemens now holds a life-sciences R&D platform; a regulatory-grade safety workflow fits directly.

12Under the hood

The ML is three lines. The product is everything around them.

For engineers curious about the stack: the prediction itself is startlingly plain. The difficulty sits in the data curation, the validation protocols, the regulatory workflow, and the LLM integration.

python · predict.pyv1.9.0

from rdkit import Chem
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier

# the molecule
mol = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O")  # aspirin

# featurize to a 2048-bit fingerprint
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)

# predict (model trained on ChEMBL + ToxCast Ames data)
pred = model.predict_proba([fp])[0]
# → [0.96, 0.04]  (non-mutagenic, mutagenic)

# applicability domain check
in_domain = is_within_training_space(fp, train_set, threshold=0.35)

That is the core of the statistical engine. The binary classifier is a few hundred lines. The Structural alert engine is RDKit SMARTS matching against a curated pattern library. Neither requires a foundation model, a GPU farm, or frontier research.

Which is why open Ames classifiers — ADMETlab 3.0, Chemprop on TDC, VEGA QSAR — sit in the same performance band as Sarah Nexus on published benchmarks, free. Sarah's moat is not accuracy. It is regulatory provenance: decades of FDA/EMA submission history, validated training-data audit trails, documented applicability-domain coverage, and model-version discipline that holds up in a review filed today and re-inspected in 2031.

The alert side is the deeper gap. Derek Nexus is four decades of hand-curated structural alerts with case histories and references. Open equivalents (Benigni-Bossa, Toxtree) exist and are usable, but the coverage and annotation quality are not yet at Derek's level. Closing that gap is content work, not ML work.

Unbury's work is the wrapper. Scaffold-split validation; applicability-domain estimation; an extensible Benigni-Bossa-seeded alert library; LLM-driven expert review with citation resolution; CTD-formatted reports; model-version pinning across drug development's 6-to-7-year timeline. A competitive model plus the validation stack that makes it regulator-grade.

ChEMBL 35 compounds

~2.5M

TDC AMES benchmark

7,255

Software concepts, translated

Hover or tap to flip

The stack

Backend

Python
FastAPI
Celery

Cheminformatics

RDKit
DeepChem

scikit-learn
XGBoost
PyTorch (future)

Data

ToxCast (~10K chemicals, 700+ assays)
ChEMBL 35 (~2.5M)
Tox21 (~10K chemicals, ~50M data points)
PubChem (119M)
DILIrank / LTKB (1,036 drugs)
UniTox (2,418 drugs, 8 organs)
TDC AMES benchmark (7,255 labelled)

Web

Next.js
React
Tailwind CSS

Platform · MVP

Supabase (Postgres + auth)
Claude API (expert review)

Platform · GxP production

AWS RDS PostgreSQL (dedicated + KMS + VPC)
S3 Object Lock (WORM audit anchor)
Hash-chained append-only audit schema
Validated backup + break-glass procedures

13The fact at the centre

The animal question, stated plainly.

Computational toxicology does not exist because it is cheaper or faster, though it is both. It exists because there is a prior cost nobody wanted to keep paying.

A working estimate. USDA tracks ~2M regulated species annually (dogs, primates, rabbits, and, since 2023, birds and voluntarily-disclosed fish). The ~95% of laboratory animals that are mice, rats, and fish are not tracked in official US statistics; advocacy-group estimates put the full total in the tens of millions. Global 2015 peer-reviewed estimate: ~192 million.

~50M (est.)

animals in US research annually, incl. untracked rodents + fish

192M

animals used in research globally (2015 peer-reviewed estimate)

$5K–$15K

quoted range for a single wet-lab Ames (CRO industry pricing)

Weeks

calendar time for a wet-lab Ames campaign per compound

50–200+

impurities per typical small-molecule drug program

85%

of Americans favour phasing out animal experiments (Morning Consult, 2024)

What an ICH M7 computational assessment prevents

Take an illustrative program with 150 impurities. At industry-quoted $5K-$15K per Ames, the fully-wet-lab bill is hundreds of thousands of dollars and several weeks of calendar time per compound at typical parallelism. Dual-QSAR screens out the Mutagenicity-safe fraction atClass 5, leaving only the remainder for the lab. The fraction depends on program chemistry; the wet-lab delta is the largest line item in the assessment budget either way.

On the training data

The statistical models are trained on historical animal testing data. ToxCast, ChEMBL, Tox21. Those data points came from experiments that are already done. The animals are gone. Using the record of their suffering to keep future animals out of labs is the most redemptive possible use of that data.

On the customers

Unbury's customers are, by definition, companies that currently test on animals. That is why they need Unbury — to stop, or to test fewer, in the places where the regulator now permits computational substitution. The product is abolitionist in its outcome; the commercial path runs through the same companies the outcome is designed to change.

The regulatory framework now permits a software prediction to stand where an animal test used to stand. The question stops being is this possible and becomes what is the product that makes it routine.

Incorporation

Delaware Public Benefit Corporation. Stated purpose: reducing and replacing the use of animals in safety testing through computational methods. The PBC charter legally protects the mission from being compromised by future capital pressure.

Alignment

No tension between profit and mission. Every dollar of revenue corresponds to assessments that replaced an animal test. There is no version of this product that works for the customer and fails the mission.

Scope

ICH M7 Mutagenicity is the first wedge because it is the one place where regulation formally substitutes software for a lab test. The same mechanism generalises: REACH, cosmetics, food additives, agrochemicals, environmental assessment.

14Honest questions

Questions worth asking.

The answers a thoughtful reader is probably already preparing. Directly and without padding.

No. QSAR Mutagenicity prediction has decades of published validation. Published benchmarks on Ames Mutagenicity put balanced accuracy in roughly the 82-92% range on held-out benchmark sets. ICH M7 itself codifies the practice — the regulators, not the vendors, decided it was good enough to substitute for a wet-lab test under the right conditions. What is new here is the workflow product around the science, not the science itself. Predictive performance is bounded by the underlying models, which everyone in the field uses some version of.

They are actively migrating to cloud. Vitic, Mirabilis, and Kaptis are already cloud-deployed; a Derek Nexus web service exists. The unmoved product is the full Derek/Sarah desktop surface and the integrated regulatory workflow on top. The constraints that make this slow are real: decades of desktop customers, a not-for-profit membership pricing model, and an installed base built around on-premise deployment. AddingLLM-driven expert review, shared cross-tenant audit trails, and self-serve per-assessment pricing at the same time is three simultaneous architectural bets. Our bet is a limited window, not a permanent gap — and Lhasa's 40 years of curated alerts is the one piece we would not rebuild even if we could.

FDA Modernization Act 2.0 (December 2022) removed the 1938 animal-testing mandate. ICH M7 R2 (2023) added the CPCA Nitrosamine framework. FDA Mod Act 3.0 passed the US Senate in December 2025, aiming to force regulators to integrate non-animal methods within 12 months of enactment — the bill awaits House action as of early 2026. EPA's OPPT ran its first non-animal cancer evaluations in 2025. The law caught up to the science in a concentrated window. Five years ago the scientific methods were ready; the regulatory permission was not.

No. The work is ML and workflow engineering against public datasets and published regulations. A toxicology advisor provides judgment on edge cases and reviews model output. The Benchling founders did not have biology PhDs before building a $6.1B life-sciences platform; they built software and hired domain experts. The same pattern applies here.

Out of scope for the wedge. ICH M7 assesses each Impurity individually, and the Ames test we are computationally replicating is a single-compound assay. Mixture toxicology is a real, unsolved scientific problem that will matter long-term for cosmetics formulations and environmental assessment. It does not need to be solved to deliver the ICH M7 product.

Published QSAR models on Ames Mutagenicity achieve roughly 85–92% balanced accuracy on held-out benchmark sets, depending on the dataset and validation protocol. Performance varies by chemical class and Applicability domain. ICH M7 explicitly anticipates this: it requires two complementary methodologies specifically so that single-model errors do not propagate, and it mandates expert review when they disagree. The workflow is designed around the uncertainty, not against it.

Every major pharmaceutical company runs ICH M7 assessments — typically with Lhasa's Derek Nexus + Sarah Nexus pair, sometimes with MultiCASE CASE Ultra or Certara / Simulations Plustools. Smaller biotechs, generics, and CROs currently either share enterprise licenses, outsource to consultants, or cobble together academic tools. The self-serve middle of the market is unserved.

It lets you skip the lab test under specific conditions: if two complementary computational predictions both return non-mutagenic on a given Impurity, the impurity is Class 5 and no further testing is required. That language has been in the guideline since 2014 and was reinforced in the 2017 and 2023 revisions. For impurities that conflict or test positive, additional work is still required — often Ames testing. Computational substitution is partial, not total. But for the fraction of impurities that clear both engines — depends on program chemistry; often a large majority — it is.

Unbury the data.Unbury the animals.

The door that opened in 2022.

FDCA mandates animal testing

EU bans cosmetic animal testing

FDA Modernization Act 2.0

ICH M7(R2) adds CPCA

NAMs funding + EPA's first non-animal cancer evaluations

Midpoint of the transition

EPA target: zero mammalian study requests

The whole thing, on one screen.

Context

The Science

The Tools

Depth

What computational toxicology is.

QSAR

SMILES

Fingerprint

Structural alert

Applicability domain

Read-across

How a prediction actually happens.

The Ames test

Which parts of a molecule actually fire an alert.

Aromatic primary amine

Structural alerts

Binary classifier on fingerprints

What ICH M7 is, in plain language.

Ames testing bill · per drug program

The nitrosamine recalls.

CPCA↗ — Carcinogenic Potency Categorisation Approach

Every impurity lands in one of five buckets.

Allowable intake depends on how long the patient takes the drug.

How an ICH M7 assessment actually gets done.

Where a single impurity assessment actually lives.

Impurities appear

Structure elucidation

Risk classification

Report + control strategy

Submission + agency review

Lifecycle maintenance

Risk classification

Watch one impurity move through the seven-tool workflow.

Import structure

What the workflow doesn't solve, even in theory.

ICH M7 is single-compound only

Parent compound, not metabolites

Three pathways, not one

Model versions drift

Structural ambiguity stalls the whole pipeline

International interpretation drift

Everything that already exists.

Unbury

How Unbury works.

Dual prediction engine

LLM-powered expert review

How does the LLM not hallucinate a citation?

CTD-formatted reports

Also inside

What Unbury is not

The industries that run computational toxicology today.

Pharmaceutical companies

Chemical manufacturers

Cosmetics & personal care

Food & agriculture

Environmental agencies

Contract Research Organisations

Why ICH M7 is a wedge, not a product.

Same shape, different training data.

Ames mutagenicity

Every regulator in the non-animal transition.

Global pharma

Global pharma

EU chemicals

EU cosmetics

US EPA

Food additives / agrochemicals

The data asset that grows with every assessment.

Expert override dataset

Regulatory track record

Unbury the data.
Unbury the animals.

CPCA — Carcinogenic Potency Categorisation Approach