FDCA mandates animal testing
The Federal Food, Drug, and Cosmetic Act is passed following the 1937 sulfanilamide elixir disaster. A 1962 amendment adds "preclinical tests (including tests on animals)." The requirement stands for 60 years.
AI platform for computational toxicology. Starting with ICH M7 Mutagenicity assessment for pharma. Built to replace animal safety testing with software predictions.
The safety data for 100 million compounds already exists, fragmented across ToxCast, ChEMBL, PubChem, Tox21. Assemble it into a working regulatory assessment and the animal test becomes optional. Unbury the data. Unbury the animals.
Counting the mice, rats, fish, and birds not tracked in USDA's ~2M regulated-species statistic. Every Unbury assessment is one the other way.
Industry quotes on each wet-lab Ames run roughly $5K-$15K and several weeks. Or minutes in software.
The threshold of toxicological concern: a theoretical cancer risk below 1 in 100,000.
For 84 years, US law required drugs to be tested on animals. That changed quietly on December 29, 2022. The ripples are still arriving.
The regulatory door opened. The software behind it is from the 1990s. Nobody has walked through with modern tools.
The operating premise of this product
Four acts. Thirteen chapters. Everything Unbury is, why it matters, how it works, and what's at stake, laid out so you can see it all before you scroll.
Why now is different from every prior moment in US drug safety law.
Computational toxicology from first principles. QSAR, SMILES, structural alerts, statistical models, the ICH M7 classification.
The current toxicologist's workflow. The landscape of software that exists. What Unbury does. Where this goes next.
The stack behind the predictions. And the fact at the center of the whole thing.
Not a pitch. An explainer. Each chapter is self-contained — skip around, or read top to bottom.
Predict how a chemical behaves in the body from its structure alone. No wet lab, no animals. The field has existed for decades; it just never had an integrated product built around it.
Acetylsalicylic acid — textbook benign structure.
Rendering is done client-side with open chem tooling. A real Mutagenicityprediction requires the trained model — shown in §04.
Every molecule has a text representation. Every text representation can be parsed, featurized, and fed to a model. That is the whole trick.
A QSAR model is a function: molecule → probability of toxicity. Modern tooling (RDKit, DeepChem, scikit-learn) can run the full pipeline in fewer lines than a typical web backend. The difficulty of the field sits elsewhere: in the data, the validation protocols, the regulatory workflow, and the judgment calls around which predictions to trust.
The umbrella concept: predict a biological effect from a chemical's structure alone. All of computational toxicology is a form of QSAR.
Text notation for molecules. `CC(=O)Oc1ccccc1C(=O)O` is aspirin. Think of it as JSON for chemistry.
A binary vector encoding structural features of a molecule. The ML model eats fingerprints, not molecules.
A known-dangerous fragment. If a molecule contains it, flag it. Regex for chemistry.
The chemical space the model was trained on. A model trained on drugs can't predict toxicity of pesticides reliably.
Predict a compound's toxicity from similar compounds with known data. k-nearest neighbors for molecules.
ICH M7 doesn't want one answer. It wants two, from complementary methodologies. Here is what each looks like, to the level of detail that matters.
Developed by Bruce Ames at UC Berkeley in the 1970s. Histidine-dependent strains of Salmonella typhimurium are exposed to the test chemical. Any mutation that restores the bacteria's ability to make histidine produces a visible colony. More colonies, more mutagenicity.
Standard ICH-recognised protocol (OECD Test Guideline 471): four S. typhimurium strains (TA98, TA100, TA1535, TA1537) plus one E. coli strain (WP2 uvrA), each tested with and without rat-liver S9 metabolic activation — ten conditions per compound. 48 to 72 hours of incubation at 37°C. Industry quotes put a full assay at roughly $5K-$15K per compound and several weeks of calendar time.
ICH M7's dual-QSAR gate substitutes software for this assay only when both computational predictions return non-mutagenic. Pharma companies can still choose to skip QSAR and go straight to Ames; the regulation permits three assessment pathways. The software route is the cheaper one when a substantial fraction of the manifest is mutagenicity-safe.
SMARTS: c-[NH2]Activates metabolically via N-hydroxylation to form DNA-reactive nitrenium species. Benzidine-class carcinogens share this substructure.
An aromatic intermediate common in dye and pharmaceutical synthesis. Two classical Ames-positive substructures in one ring.
Chemistry has a catalogue of fragments known to damage DNA. Encode each as a SMARTS pattern, match against the query molecule, report every hit. Interpretable by design: the output is a list of named danger patterns.
[N;X3](=O)=ON=NN-N=OC1OC1c-[NH2]Reference sets: Benigni-Bossa, FDA-published alerts, OECD QSAR Toolbox profilers. Lhasa's Derek Nexus covers 40+ endpoints across 40+ years of curated expert rules.
Train a model on thousands of molecules with known Ames outcomes. Given a new molecule, output a probability. The model captures statistical patterns the rule set misses, and vice versa.
CC(=O)Oc1ccccc1C(=O)O[0,1,0,1,0,0,1,1,0,...]P(mutagenic) = 0.04in_domain = Trueclass: non-mutagenic · conf 0.96On small toxicity benchmarks traditional ML (Random forest, XGBoost on fingerprints) is competitive with graph neural networks; on larger benchmarks GNNs and hybrid ensembles often win. No single architecture dominates. The hard part of the product is not the model.
ICH M7 requires both. If both return non-mutagenic, the impurity isClass 5 and no wet-lab Ames test is needed. If they disagree, anexpert review resolves the conflict.
Harmonised guideline between FDA, EMA, and regulators across 15 other jurisdictions. One of the earliest places in pharmaceutical regulation where a dual software prediction formally substitutes for a lab test; precedent now extends to skin sensitisation, fish acute toxicity, and REACH read-across under OECD defined approaches.
When a pharmaceutical company manufactures a drug, the chemical reactions leave residues: solvents, catalysts, intermediates, degradation products. These are called impurities. A typical drug has 50 to 200 of them.
Some impurities might damage DNA. The regulator requires proof, for every single impurity, that it is either safe at the level present, or controlled below a safe level. No exceptions.
The old way: test each impurity in an Ames assay. Industry quotes run roughly $5,000 to $15,000 per compound, and several weeks of calendar time each. Total bill for a late-stage drug runs into the hundreds of thousands of dollars and months of delay. (Per-compound CRO pricing is not indexed in public sources; figures reflect 2024-2026 industry quotes.)
The ICH M7 way: run both computational engines. If both say non-mutagenic, the impurity is Class 5 and the obligation is satisfied without a wet-lab test. A large fraction of impurities clear at this step; lab work concentrates on the remainder. The fraction depends on the chemistry of the program — no program-level prevalence statistic is published.
Mock outputs — click any other cell to re-classify. The real engine runs SMILES through the same dual pipeline.
Every pharma company filing a small-molecule submission has to satisfyICH M7. They can take the dual-QSAR route (software) or go straight to wet-lab Ames testing; most mix the two. The regulation does not mandate software; it gives software a formal path to substitute for the lab test when both engines agree it is safe.
Valsartan was recalled starting 13 July 2018. Then losartan and irbesartan (NMBA / NDEA). Ranitidine (Zantac) began recalls in October 2019, withdrawn entirely April 2020. Metformin ER lots withdrawn in 2020. Varenicline (Chantix) all lots recalled in 2021. The cause in every case: N-nitrosamine impurities above acceptable daily intake.
Nitrosamines were previously considered negligible trace residues. Regulators discovered they could form during ordinary manufacturing or storage. Hundreds of drug products were pulled from pharmacy shelves in the following four years.
A nitrosamine-specific risk framework. Classify the N-nitroso structure against known potency drivers (α-hydrogen count, steric environment,activating substituents) and assign an acceptable intake limit. Every pharma manufacturer now has to run this on every potentially-forming nitrosamine in every drug product.
Unbury handles CPCA as a specialised track within the ICH M7 workflow. Same dual-engine shape, nitrosamine-specific rule set, nitrosamine-specific intake limits, same CTD output.
ICH M7 specifies the decision rule for each class. The computational prediction determines which branch applies.
Positive Ames + positive rodent carcinogenicity
Control below compound-specific limit
Aflatoxin B1, N-nitrosodimethylamine (NDMA) in well-characterised contexts
Positive Ames, no carcinogenicity data
Control below TTC-based limits
Aromatic primary amines with positive Ames in the literature
Alert detected, no Ames data available
Control at TTC or run the Ames test. If negative → Class 5. If positive → Class 2.
Novel intermediates with nitro, Epoxide, or azo substructures
Alert detected, same alert in the Drug substance itself (which tested negative)
Treat as non-mutagenic
Degradants carrying the same alert as a cleared Active pharmaceutical ingredient
Dual QSAR both negative, or sufficient literature
No further action
The bulk of a typical Impurity manifest after dual-QSAR screening — fraction varies with program chemistry
ICH M7's Threshold of Toxicological Concern derives from linear back-extrapolation of rodent TD50 data to a theoretical lifetime cancer risk below 1 in 100,000. The shorter the exposure, the higher theallowable daily intake — because less time means less accumulated risk at the same exposure level.
Each limit corresponds to a theoretical lifetime cancer risk below 1 in 100,000 for a 50 kg adult. Values scale for paediatric populations.
Before anyone runs a QSAR model, impurities have to be synthesised into existence, caught by analytical chemistry, and characterised into a structure. After the classification, the result has to survive CTD assembly, agency review, and a 20-year lifecycle of process changes. The seven-step workflow below lives inside one phase of a much longer pipeline.
Every impurity moves through one of three pathways defined by ICH M7: (1) dual QSAR — if both rule-based and statistical engines return non-mutagenic, the impurity is Class 5 and no wet-lab test is needed; (2) QSAR + Ames — positive predictions trigger a confirmatory wet-lab test; (3) direct Ames — some programs skip computational assessment entirely.
N-nitrosamines move through the specialised CPCA track added in R2 (2023), which assigns an acceptable intake by potency category rather than via the default TTC.
This is the phase Unbury addresses. The seven-step drill-down is shown below.
hours to days per batch
ICH M7 class + rationale flows into control strategy + report
The three pathways mean this is not a linear process but a branching tree. A single impurity can enter Phase 3 multiple times — first as a computational screen, later for follow-up after a model update or a process change.
A single impurity from a real-sized assessment, played in calendar time. Press play, or toggle the Unbury path to see the same assessment on one surface.
A degradation-product Impurity was isolated by MS + NMR. Its proposed structure is written as a SMILES — text notation encoding atoms, bonds, and connectivity. That string is what enters the workflow.
Derek Nexus is one step. Sarah Nexus is another step. Every existing tool is a step. Unbury is the workflow that binds them — one surface, one Audit trail, one click.
Limitations worth naming. Some are product-design decisions. Others are open scientific problems. A good tool doesn't pretend they aren't there.
Every impurity is assessed individually. Mixture toxicology — how impurities interact with each other or with the drug substance — is outside scope and remains an unsolved scientific problem.
ICH M7 assesses the impurity as filed. In-vivo metabolism can produce mutagenic fragments not predicted from the parent structure; those fall under other ICH guidelines (Q3C, S1A/B, S2(R1)) on separate workflows.
Pharma can skip QSAR entirely and run direct Ames; that is also ICH M7 compliant. Software is a cost-reduction path, not a regulatory mandate.
Derek and Sarah update annually; ChEMBL and Tox21 release refresh corpora. An assessment filed in 2022 may need re-running if a 2026 alert-set update changes a call.
No SMILES, no prediction. Analytical characterisation remains the upstream rate-limiting step, with no amount of QSAR investment able to compensate.
FDA, EMA, PMDA, Health Canada, and the MHRA each apply ICH M7 with local variations. A filing accepted in one jurisdiction may require supplementary data in another.
Incumbents have regulatory credibility. Academic tools have modern models. Nobody is in the upper-right quadrant. That quadrant is what ICH M7 compliance needs now.
Dual prediction engine. LLM-powered expert review. CTD reports. Audit trail. Model version tracking. Nitrosamine CPCA. Cloud-native from day zero.
Positioned below Lhasa membership
Earning regulatory track record.
Lhasa ships forty years of curated alerts; we would not build Derek Nexus 2.0 if we could. They are actively migrating products to cloud — Vitic, Mirabilis, Kaptis — so the window on cloud-native ICH M7 workflow is real but finite. Our bet is on what they have not built yet: integrated LLM expert review, shared audit trails, per-assessment pricing, self-serve onboarding. Their not-for-profit membership structure disincentivises undercutting those prices.
One platform for the whole ICH M7 workflow, from SMILES paste to CTD-ready report. Below are three screens that together replace the seven disconnected tools above.
Rule-based and statistical (QSAR) models run in parallel on every compound. Satisfies ICH M7's complementary-methods requirement in a single click. Results side-by-side with the substructures that fired, model confidence, and an applicability-domain flag.
Built on RDKit (structural matching) and scikit-learn / XGBoost (fingerprint-based classifier).
Mock classifier using simple pattern matching on SMILES strings. The real engine uses RDKit SMARTS matching + a Morgan-fingerprint XGBoost model on Ames training data.
When the two engines disagree, incumbents dump it into an email chain. Unbury routes it through an LLM that analyses the Structural alert in context, retrieves analogous compounds from a frozen public corpus, and drafts regulatory-grade reasoning with verified citations. A trained reviewer signs off.
Not frontier research; academic LLM tools for QSAR interpretation already exist (the open-source O-QT Assistant, 2024). What is novel is packaging this as a regulatory-grade, audit-trailed, commercially supported workflow step inside an ICH M7 product.
Legitimate objection. The answer has to be structural, not hopeful. Four layers, stacked.
Analog compounds are retrieved from a frozen ChEMBL + Tox21snapshot via structural similarity search. The LLM can only cite what the retriever actually returned. It never invents a ChEMBL ID.
Every identifier in the draft reasoning is round-tripped against the database before render. Broken or unresolved IDs are surfaced as errors, never silently passed through.
The LLM drafts. A trained reviewer accepts, edits, or overrides. No prediction exits the system as an LLM-only artefact. The audit trail records the draft, the change, and the signer.
Every prompt, retrieval set, draft, edit, and override is written to an append-only audit table with a hash chain; chain heads commit to external WORM object storage. The reasoning behind any prediction is reproducible end-to-end, satisfying 21 CFR Part 11 §11.10(e).
Section 3.2.S.3.2 of the Common Technical Document, generated from every assessment. Methods used, results, expert review rationale, classification, intake calculations, control strategy, full audit trail. The format regulators expect in submissions.
Plus model version tracking across the 6–7 year drug development lifecycle — which build of which engine was used for which prediction.
Sample is illustrative. Real CTD output is .docx/.pdf with full audit trail, citations, and regulator-specific formatting.
Context, not target list. Anyone producing, importing, or regulating chemicals in a jurisdiction with modern toxicology law has a use for this work.
ICH M7 Mutagenicity screening, DILI prediction, hERG cardiotoxicity, ADMET triage
FDA, EMA, ICH compliance
One late-stage drug failure from unexpected toxicity costs $500M–$2.6B. Computational screening is cheap insurance.
EU REACH compliance, OECD test guidelines, environmental hazard assessment
REACH requires toxicity data on ~30,000 chemicals annually
REACH accepts QSAR predictions as weight-of-evidence contributions.
Ingredient safety assessment for formulations
EU banned animal testing for cosmetics in 2013. 44 countries with bans.
Fastest-growing segment. Indie / DTC brands launching constantly, each needing safety files.
Safety assessments for food additives and agrochemicals
EFSA, FDA CFSAN, EPA FIFRA
Increasing regulatory pressure on chronic dietary exposure endpoints.
Chemicals in water, soil, and ecosystems
EPA toxicity assessment, state programs
EPA committed to eliminating all mammalian study requests by 2035.
Offer computational toxicology as a service to their own clients
Downstream pharma demand
One CRO deployment reaches dozens of customers indirectly.
Estimates reflect relative industry spend on computational safety assessment software. Pharma ICH M7 is the initial beachhead; the mechanism generalises across all segments above.
The same workflow shape — dual engines, LLM expert review, audit trail, CTD output — applies across every regulated toxicity endpoint and every jurisdiction in active non-animal transition. The wedge is one endpoint in one industry. The platform is the rest.
Each row below is a separate regulated toxicity endpoint. Each is, at its core, the same product: dual QSAR plus LLM expert review plus CTD-grade output. Different dataset, different alert library, different intake calculation. Same engine.
DNA damage via bacterial reverse mutation assay (the Ames test)
ICH M7 · FDA · EMA · PMDA
The wedge. Regulation formally substitutes QSAR for wet lab; dual-engine is a mandate.
ICH M7 is the first place software formally substitutes for a wet-lab test. It is not the last. Other regulated industries and jurisdictions are moving through the same transition on staggered timelines.
Mutagenic impurities in small-molecule drugs
Other impurity, carcinogenicity, repro-tox guidelines
~30,000 chemicals/yr requiring safety data
Finished products + ingredients
Pesticides + industrial chemicals
Dietary exposure safety assessment
A prediction sold is a prediction spent. The asset we are building is not the prediction itself but the by-product of running it: the overrides, the accepted filings, the analog graph.
Every customer-opt-in assessment produces a labelled data point: input SMILES, dual-engine predictions, structural alerts, LLM reasoning draft, toxicologist override, final regulator-accepted classification. Over three to five years this becomes a real-world corpus of what QSAR got wrong and why. Pharma IP protections mean this asset compounds only where customers agree to contribute — modelled after the Lhasa Vitic consortium, which is its own proof that opt-in sharing works when structured right.
Every FDA-accepted submission citing Unbury's assessment is a future-sales asset. Credibility in regulated markets compounds non-linearly: the first ten acceptances are hard-won, the next hundred are routine, and competitors cannot shortcut the filing history. Realistic horizon is three to five years of early-customer submissions before this crosses from promise to moat.
LLM expert review retrieves structural analogs from public corpora on every conflict. The retrieval decisions themselves — which analog best resolves which alert in which context — become a cross-reference graph that does not exist in any single public database. This piece is vendor-controlled; it does not require customer opt-in because it is derived entirely from public source data.
None of these assets exist at T+0. All three compound from the first assessment onward. The value of the platform at year five is not the predictor at year five — it is the five-year-old dataset of everything the predictor got right and wrong.
Unbury at maturity is the platform where every computational safety assessment is produced and filed. One place for every toxicity endpoint, every regulatory framework in active non-animal transition, every industry whose products cross a regulator's desk. The wedge is pharma mutagenicity because that is where the regulation has formally arrived. The rest is a matter of the transition catching up.
Public comparables in the adjacent space — Certara at ~$1.1B cap on $419M revenue, Simulations Plus at ~$300M on $79M, Veeva at ~$26B on $3.2B — are the shape of the category at maturity. None are operating-system-scale yet. That slot is open.
Complementary. Veeva Vault RIM is the filing cabinet; Unbury generates the scientific evidence that goes inside the filings. Natural integration partner, plausible long-term acquirer.
Biosimulation giant. Acquired Chemaxon for ~$90M (October 2024). Strategic fit: add Unbury's ICH M7 workflow to their PBPK-led footprint at FDA.
Traditional ML ADMET vendor. Market cap compressed in 2025. Architecturally distant from cloud-native; acquisition target if they attempt a rebuild.
Dotmatics (IDBS + Prism + SnapGene) acquired by Siemens in July 2025 for $5.1B. Siemens now holds a life-sciences R&D platform; a regulatory-grade safety workflow fits directly.
For engineers curious about the stack: the prediction itself is startlingly plain. The difficulty sits in the data curation, the validation protocols, the regulatory workflow, and the LLM integration.
from rdkit import Chem
from rdkit.Chem import AllChem
from sklearn.ensemble import RandomForestClassifier
# the molecule
mol = Chem.MolFromSmiles("CC(=O)Oc1ccccc1C(=O)O") # aspirin
# featurize to a 2048-bit fingerprint
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)
# predict (model trained on ChEMBL + ToxCast Ames data)
pred = model.predict_proba([fp])[0]
# → [0.96, 0.04] (non-mutagenic, mutagenic)
# applicability domain check
in_domain = is_within_training_space(fp, train_set, threshold=0.35)That is the core of the statistical engine. The binary classifier is a few hundred lines. The Structural alert engine is RDKit SMARTS matching against a curated pattern library. Neither requires a foundation model, a GPU farm, or frontier research.
Which is why open Ames classifiers — ADMETlab 3.0, Chemprop on TDC, VEGA QSAR — sit in the same performance band as Sarah Nexus on published benchmarks, free. Sarah's moat is not accuracy. It is regulatory provenance: decades of FDA/EMA submission history, validated training-data audit trails, documented applicability-domain coverage, and model-version discipline that holds up in a review filed today and re-inspected in 2031.
The alert side is the deeper gap. Derek Nexus is four decades of hand-curated structural alerts with case histories and references. Open equivalents (Benigni-Bossa, Toxtree) exist and are usable, but the coverage and annotation quality are not yet at Derek's level. Closing that gap is content work, not ML work.
Unbury's work is the wrapper. Scaffold-split validation; applicability-domain estimation; an extensible Benigni-Bossa-seeded alert library; LLM-driven expert review with citation resolution; CTD-formatted reports; model-version pinning across drug development's 6-to-7-year timeline. A competitive model plus the validation stack that makes it regulator-grade.
Computational toxicology does not exist because it is cheaper or faster, though it is both. It exists because there is a prior cost nobody wanted to keep paying.
A working estimate. USDA tracks ~2M regulated species annually (dogs, primates, rabbits, and, since 2023, birds and voluntarily-disclosed fish). The ~95% of laboratory animals that are mice, rats, and fish are not tracked in official US statistics; advocacy-group estimates put the full total in the tens of millions. Global 2015 peer-reviewed estimate: ~192 million.
Take an illustrative program with 150 impurities. At industry-quoted $5K-$15K per Ames, the fully-wet-lab bill is hundreds of thousands of dollars and several weeks of calendar time per compound at typical parallelism. Dual-QSAR screens out the Mutagenicity-safe fraction atClass 5, leaving only the remainder for the lab. The fraction depends on program chemistry; the wet-lab delta is the largest line item in the assessment budget either way.
The statistical models are trained on historical animal testing data. ToxCast, ChEMBL, Tox21. Those data points came from experiments that are already done. The animals are gone. Using the record of their suffering to keep future animals out of labs is the most redemptive possible use of that data.
Unbury's customers are, by definition, companies that currently test on animals. That is why they need Unbury — to stop, or to test fewer, in the places where the regulator now permits computational substitution. The product is abolitionist in its outcome; the commercial path runs through the same companies the outcome is designed to change.
The regulatory framework now permits a software prediction to stand where an animal test used to stand. The question stops being is this possible and becomes what is the product that makes it routine.
Delaware Public Benefit Corporation. Stated purpose: reducing and replacing the use of animals in safety testing through computational methods. The PBC charter legally protects the mission from being compromised by future capital pressure.
No tension between profit and mission. Every dollar of revenue corresponds to assessments that replaced an animal test. There is no version of this product that works for the customer and fails the mission.
ICH M7 Mutagenicity is the first wedge because it is the one place where regulation formally substitutes software for a lab test. The same mechanism generalises: REACH, cosmetics, food additives, agrochemicals, environmental assessment.
The answers a thoughtful reader is probably already preparing. Directly and without padding.