AI in Pharmaceuticals & Drug Discovery•December 18, 2025•By 3L3C

A practical look at H2M, a pipeline mapping human variants to mouse equivalents to improve GEMMs, genome editing design, and translational drug discovery.

H2MGEMMgenome editingvariant mappingpreclinical researchcomputational biologydrug discovery

Featured image for Predict Human Variants in Mice for Better Drug Models

Predict Human Variants in Mice for Better Drug Models

A single detail derails more preclinical programs than most teams admit: the mouse model doesn’t actually match the human mutation you think you’re testing. Not “close enough.” Not “same gene.” The same variant, in the same local sequence context, producing the same functional change.

That mismatch shows up later as confusing efficacy signals, non-transferable biomarkers, and target hypotheses that look clean in rodents but fall apart in humans. If you’re working in pharma or biotech, you’ve probably seen a project lose months because a “faithful” genetically engineered mouse model (GEMM) turned out to be a best-effort approximation.

A new computational framework called H2M (human-to-mouse) tackles that exact problem by building a standardized, large-scale “dictionary” that maps clinically observed human variants to engineerable mouse equivalents—and it goes beyond simplistic ortholog mapping. For anyone serious about AI in pharmaceuticals and drug discovery, this is one of those unglamorous but high-impact advances: it improves the inputs to preclinical research, which improves everything downstream.

Why mouse models still fail: variant mismatch is the quiet culprit

Answer first: Many GEMMs fail as translational tools because they replicate a gene but not the human mutation’s exact nucleotide or protein change, and the local genomic context can alter the biology.

We’ve gotten used to saying “the mouse is genetically similar,” but similarity isn’t identity. Three common failure modes show up repeatedly:

1) Orthologs aren’t one-to-one in practice

Even when two genes are labeled orthologs, the mapping can be nonlinear. Alternative transcripts, exon boundaries, and codon usage differences can mean the “same” edit produces different consequences.

2) Same DNA change ≠ same protein change

A nucleotide substitution at a corresponding position in mouse can yield:

A different amino acid substitution
No amino acid change (silent)
A frameshift or altered splicing effect due to context

3) Local sequence context changes functional impact

A missense change in a conserved domain might behave similarly across species, while the same type of change in a less conserved region can produce species-specific effects. That’s a major deal for target validation and mechanism-of-action work.

What’s been missing is a practical, standardized way to answer: “Can we model this specific human variant in mouse, and if so, what’s the most faithful edit?”

What H2M does differently: from “orthologs” to a variant engineering dictionary

Answer first: H2M is a computational pipeline that takes human variant data and outputs predicted mouse equivalents—at both the DNA (nucleotide) and protein (peptide) effect levels—so teams can engineer GEMMs that better mirror clinical reality.

H2M runs a four-step workflow:

Find orthologous genes (using integrated homolog catalogs)
Align transcripts or proteins (transcripts for noncoding variants; peptides for coding)
Simulate the mutation
Model functional effects and produce standardized outputs

The practical difference is that H2M explicitly distinguishes between:

NCE (Nucleotide Change Effect): the DNA-level alteration
PCE (Peptide Change Effect): the resulting amino acid change for coding variants

This matters because drug discovery often cares about PCE (protein function), while genome editing logistics often start at NCE (what you can edit at the locus).

The three modeling strategies (and why they matter to pharma)

H2M applies three strategies depending on what can be faithfully mirrored:

Strategy I: NCE-only modeling
- Use the same DNA-level change in mouse.
- Most helpful for noncoding and frameshifting events where the goal is the genomic alteration itself.
Strategy II: NCE-for-PCE modeling
- The same DNA change also produces the same amino acid change.
- This is the “high-confidence” scenario for cross-species comparability.
Strategy III: Extended NCE-for-PCE modeling
- If the same DNA change doesn’t yield the same amino acid change, H2M searches codon alternatives to achieve the same PCE in mouse.
- This is where many legacy models quietly fail—because teams stop at Strategy I and assume protein equivalence.

If you’re building preclinical packages around a target hypothesis, Strategy II and III are the difference between “we edited something” and “we edited the biology we meant to test.”

Scale and coverage: what the numbers say (and what they imply)

Answer first: H2M’s first public database includes 3,171,709 human-to-mouse mutation mappings and predicts that more than 80% of human variants can be modeled in mice.

Using clinically observed variants curated from large resources (cancer-focused and clinical interpretation datasets), H2M:

Mapped 96% of input human genes to mouse orthologs
Produced a database spanning over 3.17 million variant mappings
Reported that >80% of human variants are predicted to be modelable in mouse

Two nuance points are especially relevant for drug discovery teams:

Coding variants are easier to model than noncoding variants

That’s consistent with higher conservation in coding regions. If your therapeutic hypothesis depends on regulatory variants, deep intronic changes, or species-specific enhancers, you should assume higher risk and demand stronger validation.

Indels remain harder than substitutions

H2M observed lower coverage for indels than for single or multinucleotide substitutions. In practical terms: if your project is anchored on a recurrent indel hotspot, expect higher engineering complexity and more careful benchmarking.

“Flank size” is a better reality check than gene conservation alone

Answer first: H2M introduces flank size—the amount of locally conserved sequence around a variant—as a practical proxy for whether a mutation sits in a region likely to behave similarly across species.

Flank size is defined as:

For noncoding variants: conserved nucleotides on both sides
For coding variants: conserved amino acids on both sides

In the H2M database:

50% of coding mutations have flank size ≤ 18 amino acids
50% of noncoding mutations have flank size ≤ 14 nucleotides

As flank size requirements increase (demanding more local conservation), the percentage of variants that can be modeled decreases—because you’re filtering toward regions of high homology.

Here’s the stance I take: teams should stop treating “same gene” as sufficient and start using a local conservation threshold as a go/no-go gate for expensive in vivo work.

A concrete illustration: KIT variants and conserved functional domains

One example analyzed with H2M focuses on KIT (human) and Kit (mouse). The analysis shows that missense variants in certain functional domains (notably transmembrane/juxtamembrane and kinase regions) are more likely to be faithfully modelable.

That aligns with what most biologists already suspect: conserved domains are more likely to carry conserved function. H2M turns that intuition into a searchable, standardized output you can act on.

From prediction to execution: guiding base editing and prime editing design

Answer first: H2M doesn’t just say “this variant is modelable”; it supports standardized outputs that downstream tools can use to design base-editing and prime-editing guides for precision engineering.

This is where the “AI in drug discovery” angle becomes very tangible. Variant modeling isn’t valuable unless it shortens the path to experiments.

In a demonstrated subset of cancer-associated variant pairs, H2M was used in combination with prime-editing guide design workflows to produce:

24,680 base-editing gRNAs covering 4,612 mutations
48,255 prime-editing gRNAs covering 9,651 mutations

For preclinical leaders, the key implication is speed and standardization:

Faster feasibility assessment (can we build it?)
Faster design iteration (how should we edit it?)
Better comparability across programs (same formats, same nomenclature)

If your organization is investing in variant-to-function pipelines, these kinds of interoperable artifacts are what make scaling possible.

Practical applications in pharma: where this changes decisions

Answer first: The biggest impact of variant mapping tools like H2M is decision quality—choosing the right model, the right edit strategy, and the right experiments earlier.

Here are the most practical ways teams can use a human-to-mouse variant dictionary in preclinical and translational workflows.

1) Target validation with clinically realistic alleles

Rather than testing a convenient knockout or overexpression, teams can prioritize clinically observed mutations that better reflect patient biology—especially important in oncology and rare disease.

2) Biomarker strategy that survives translation

A biomarker tied to an imprecise model can look “predictive” in mice and evaporate in humans. By improving variant fidelity, you reduce the chance your biomarker is an artifact of the model.

3) Rational selection of GEMM vs. alternative models

If H2M suggests the variant isn’t modelable with acceptable flank size or requires complex extended modeling, that’s a signal to consider:

Humanized systems
Organoids with patient-derived edits
In vivo alternatives focused on pathway-level perturbation rather than exact alleles

4) Prioritization of variants for functional screening

H2M-style mapping supports high-throughput prioritization: you can triage variants by modelability, conservation, and predicted functional impact before spending on animal generation.

FAQ: questions teams ask once they try to operationalize this

“If >80% of variants are modelable, does that mean mouse is ‘good enough’?”

No. Modelability is not equivalence. It tells you the edit can be made in a corresponding region. You still need phenotypic validation, especially for regulatory variants.

“Should we always force the same amino acid change (PCE) even if the DNA change differs?”

Often yes for mechanism-of-action questions anchored on protein function. But not always—if the project is about DNA-level regulatory mechanisms, you may care more about NCE.

“How does this connect to AI-driven drug discovery?”

AI models are only as good as the biological truth they’re trained and tested against. Better preclinical genotype fidelity improves target validation data, mechanistic labels, and translational signals, which improves downstream AI tasks like response prediction and biomarker discovery.

Where this points next for AI in pharmaceuticals and drug discovery

Preclinical research is getting more computational every year, but the constraint hasn’t changed: you still need experimental systems that reflect human biology closely enough to trust the readouts.

Tools like H2M push the field toward a more disciplined standard: if you can’t specify the human variant you’re modeling—and show how you matched its DNA and protein consequences—you’re not doing precision preclinical science.

If you’re building an AI-enabled drug discovery pipeline, this is a good place to tighten the bolts. The cleanest machine learning model in the world can’t rescue noisy biology.

If your team wants to reduce model risk, shorten iteration cycles, and improve translational confidence, start by auditing your current GEMMs: Which ones truly match the clinical variants you’re using to justify the program—and which ones are “close enough” on paper?