AI-generated morphology can predict how cells respond before imaging. Learn how diffusion models improve MOA retrieval—and what it means beyond pharma.

Predict Cell Morphology With AI Before You Run Assays
A single high-content imaging screen can burn through weeks of lab time, a mountain of reagents, and more analyst hours than anyone wants to admit. Yet teams still end up asking the same question: Which perturbations are worth imaging at all? When the space of possible drugs, doses, gene edits, and combinations explodes into the millions, “just run the experiment” stops being a plan.
A recent research direction in generative AI for drug discovery argues for a different first step: simulate the microscope readout before you touch a plate. A model called MorphDiff does something that used to sound like science fiction but now looks oddly practical—generate realistic, multi-channel cell images conditioned on gene expression. If you have the transcriptomic signature of a perturbation, you can preview what the cells are likely to look like.
I’m going to walk through what MorphDiff is really doing, why it matters to phenotypic screening, and where teams can use it today without fooling themselves. Then I’ll connect the dots to a surprising place: energy and utilities, where the same “simulate outcomes before implementation” mindset is quickly becoming the difference between confident decisions and expensive guesswork.
MorphDiff in one sentence: transcriptome in, microscopy out
MorphDiff’s core idea is simple to state and hard to execute: learn the mapping between perturbed gene expression and cell morphology well enough to generate post-perturbation images.
In the standard workflow, teams often run some mix of:
- Transcriptomics (e.g., L1000-style profiles) to see what pathways respond
- High-content imaging (Cell Painting) to see what the cell does physically
The catch is that paired datasets—where you have both the expression profile and the microscopy for the same perturbation—are much rarer than gene expression alone. MorphDiff leans into that asymmetry: train on the paired subset, then generalize to the much broader world where you can obtain transcriptomic signatures more easily.
Two practical modes: generate from scratch or “edit” from control
MorphDiff supports two ways to produce predicted morphology:
- Gene-to-image (G2I): Start from noise and generate an image conditioned on the L1000 gene expression vector.
- Image-to-image (I2I): Start with a real control image and transform it into the perturbed version, guided by the same transcriptomic condition.
That second mode is underrated. In day-to-day discovery, most conversations aren’t about the absolute image—they’re about what changed relative to control. If your model can produce a believable “after” image from a real “before,” you get a more interpretable delta.
Why diffusion models fit biology better than most people expect
The research uses a latent diffusion model rather than generating full-resolution microscopy images directly. Here’s the useful mental model:
- A morphology VAE compresses 5-channel Cell Painting images into a compact latent representation and can reconstruct them with high perceptual fidelity.
- A diffusion model learns to generate (or transform) those latent codes, step by step, while being guided by gene expression through attention.
This approach matters because biological imaging is messy in ways consumer images aren’t. Plates differ. Staining varies. Cells are heterogeneous. Diffusion models are naturally tolerant of noise and can represent distributions rather than collapsing to a single “most likely” look.
A useful rule: if your data is heterogeneous and your “ground truth” is a distribution, diffusion models are often a better fit than GAN-style approaches.
“Pretty images” aren’t the win. Feature fidelity is.
The paper’s most convincing argument isn’t that generated images look realistic. It’s that the generated morphology preserves the quantitative structure that biologists and ML teams actually use:
- Hundreds of CellProfiler features (texture, intensity, granularity, cross-channel correlations)
- Embedding-based fingerprints (e.g., DeepProfiler-style representations)
- Critically, the correlation structure between gene expression and morphology features
The reported result is that over 70% of generated feature distributions are statistically indistinguishable from real data in their tests, and that the model better captures the difference-from-control behavior on the most perturbed features.
That’s what makes simulated microscopy potentially useful for decision-making rather than demos.
The downstream payoff: better mechanism-of-action retrieval
Most screening orgs don’t generate images for fun. They want mechanism-of-action (MOA) retrieval: given a query perturbation, find other compounds with a similar biological effect.
MorphDiff’s claim (backed by retrieval experiments) is straightforward:
- Morphology generated from transcriptomics can outperform transcriptome-only retrieval.
- It can approach the retrieval accuracy achieved with real images.
- In top-k retrieval settings, the paper reports average improvements of 16.9% over the strongest image-generation baseline and 8.0% over transcriptome-only.
That’s a strong sign that simulated morphology isn’t redundant. It can encode complementary information—especially when two molecules look nothing alike chemically but converge on the same pathway.
Where this fits in an “AI in Pharmaceuticals & Drug Discovery” stack
In this topic series, we’ve talked about how modern discovery stacks are becoming modular: you don’t pick one model; you compose several.
MorphDiff naturally becomes a middle layer:
- Upstream: You have perturbations (compounds, CRISPR edits) with gene expression signatures.
- Middle: MorphDiff generates morphology (images and/or embeddings) as a richer phenotype proxy.
- Downstream: You run clustering, MOA retrieval, triage, and prioritization to decide what gets real imaging and follow-up assays.
If your phenotypic screening pipeline is constrained by imaging throughput or budget, this is an obvious place to experiment.
How to use predicted morphology without getting fooled
Predicted phenotypes are powerful, and they’re also easy to misuse. Here’s what works in practice if your goal is fewer wasted experiments (not replacing the lab).
1) Use it as a triage layer, not a verdict
The highest-value workflow is:
- Generate predicted morphologies for a large set of candidates
- Cluster them against a reference atlas of known mechanisms
- Select a smaller set for real imaging and orthogonal validation
This is the same pattern that works across AI in drug discovery: use AI to narrow the search, then spend your wet-lab budget where the posterior probability is highest.
2) Track “distance from training distribution” explicitly
Generalization drops when perturbations don’t resemble what the model has seen. Teams should operationalize that risk instead of hand-waving it.
Practical checks:
- Compute embedding distances to nearest neighbors in training data
- Flag novel chemistry classes or rare pathway signatures
- Require confirmation imaging for any high-impact decision when novelty is high
3) Treat time and dose as first-class citizens
A real limitation in this line of work is that many datasets don’t encode timepoint and concentration in a way models can reliably learn.
If you’re piloting this internally, it’s worth designing data collection so you can condition on:
- Dose (with consistent units and ranges)
- Time post-perturbation
- Cell line context (and ideally growth conditions)
You’ll get models that are less “one snapshot” and more “response surface.” That’s the difference between a cool demo and a tool discovery scientists will keep using.
4) Chain models thoughtfully (expression prediction → morphology prediction)
MorphDiff requires perturbed gene expression as input. In real pipelines, you won’t always have that.
A realistic architecture is:
- Model A predicts gene expression for unseen perturbations (from chemical structure and context)
- MorphDiff (Model B) predicts morphology from predicted expression
- A retrieval/prioritization layer makes decisions with uncertainty estimates
Chaining adds error, but it also expands coverage dramatically. If you manage uncertainty and validate on held-out chemical series, it’s a practical trade.
From cells to grids: why energy & utilities should care
Drug discovery isn’t the only industry stuck between expensive experiments and high-stakes decisions. Energy and utilities face the same shape of problem:
- Changes are costly (field deployments, substation upgrades, DER programs).
- Real-world experimentation can be slow (seasonality, regulatory timelines, safety constraints).
- Outcomes depend on interacting variables (weather, demand, network topology, human behavior).
Here’s the bridge: MorphDiff is essentially a conditional simulator. It takes a compact “molecular readout” (transcriptome) and generates a high-dimensional “system response” (morphology).
Energy teams are building the same pattern:
- Condition: forecasts + sensor data + topology + asset health signals
- Generate: expected network state under an intervention (switching, setpoint changes, new control policies)
- Decide: what to deploy, where, and in what order
The mindset shift is identical: simulate the future of your system before you commit resources in the real world.
A concrete analogy: MOA retrieval vs grid intervention retrieval
MOA retrieval asks: “What known mechanism does this new compound resemble?”
A grid analog asks: “What prior intervention produced a similar system response?” Examples:
- Which voltage optimization strategy previously reduced losses without triggering power quality events?
- Which demand response design shifted peak load and avoided rebound effects?
- Which maintenance action preceded a measurable reduction in transformer thermal stress?
In both cases, you’re searching a library of historical outcomes—and your retrieval improves when you can represent the system response richly (morphology embeddings in biology; state/behavior embeddings in grids).
What to pilot in the next 90 days (practical, not theoretical)
If you’re in pharma/biotech and want to test AI-generated morphology without boiling the ocean, here’s a pilot plan that’s small enough to run and strict enough to be meaningful.
- Pick one constrained domain. One cell line, one assay setup, one plate protocol, and a focused perturbation set (e.g., a pathway-focused compound collection).
- Define the decision you want to improve. Examples: reduce imaging volume by 30%; improve MOA hit rate in the top-50 prioritized compounds.
- Create a paired holdout that reflects reality. Hold out entire chemical series or targets, not random wells. Random splits flatter models.
- Evaluate on embeddings and retrieval, not just image metrics. Image realism metrics are weak proxies. Your KPI is: do we find the right neighbors and make better choices?
- Build a human-in-the-loop review. Give biologists “control vs predicted perturbed” deltas and feature shifts, then track which predictions they trust—and whether that trust is earned.
This is also where cross-industry teams can learn from energy deployments: utilities tend to be disciplined about operational validation, rollback plans, and monitoring drift. Drug discovery should borrow that rigor.
Where this goes next
Generative AI in drug discovery is moving from “generate molecules” to “generate outcomes.” MorphDiff is a clean example: it treats cell morphology as something you can model and simulate, not only measure.
The bigger opportunity is combining modalities into a single decision loop: chemical structure, predicted expression, predicted morphology, and real follow-up assays feeding back into a continuously improving system.
If you could simulate the biological “after” state well enough to pick the right 5% of experiments, what would you do with the other 95% of your budget—run more confirmatory biology, broaden target space, or shorten cycle time? That’s the question teams should be arguing about going into 2026.