Predict Cell Morphology From Gene Data Before Experiments

Artificial Intelligence & Robotics: Transforming Industries WorldwideBy 3L3C

Predict cell morphology from gene expression before lab work. See how diffusion models can prioritize screening and improve MOA discovery.

drug discoverygenerative AIdiffusion modelsphenotypic screeningcomputational biologycell imaging
Share:

Featured image for Predict Cell Morphology From Gene Data Before Experiments

Predict Cell Morphology From Gene Data Before Experiments

Drug discovery has a dirty secret: the microscope is often the bottleneck.

If your team runs phenotypic screening, you already know the tradeoff. High-content imaging can reveal mechanism-of-action (MOA) signals that chemistry and transcriptomics miss. But imaging everything is unrealistic—plates, reagents, instrument time, and analyst hours add up fast. Meanwhile, public perturbation datasets keep growing, especially gene expression profiles, creating a weird imbalance: we’re rich in molecular readouts and poor in paired cell images.

A 2025 Nature Communications paper introduces a practical way to close that gap: generate the “after” microscopy images directly from perturbed gene expression. The model—called MorphDiff—uses a diffusion approach conditioned on L1000 transcriptomic signatures. The promise is straightforward and very 2025: preview cellular morphology before you run the experiment, then spend real lab budget only where it matters. In the context of our “Artificial Intelligence & Robotics: Transforming Industries Worldwide” series, this is a clean example of AI doing what automation has always done best—turning scarce, expensive operations into targeted, high-confidence work.

The problem: imaging doesn’t scale, but hypotheses do

Phenotypic screening scales poorly because imaging is expensive and slow, while the hypothesis space explodes. A modern discovery pipeline might want to test:

  • Thousands of small molecules across multiple doses and timepoints
  • Hundreds to thousands of gene edits (CRISPR KO/CRISPRi/CRISPRa)
  • Combinations (drug + edit, drug + drug)
  • Multiple cell lines for translational relevance

Even “high-throughput imaging” hits a wall when you multiply conditions. The reality? Many teams end up imaging a narrow slice of the space and hoping it includes the winners.

Transcriptomics is different. The L1000 ecosystem (popularized through LINCS) made perturbed gene expression more accessible at scale, including large public libraries. So you often do have a gene signature for a compound or perturbation—even when you can’t afford to image it.

MorphDiff builds directly on a reasonable biological stance: gene expression changes drive downstream protein and pathway activity, which eventually shapes what cells look like (organelle texture, cell size, intensity patterns, spatial organization). The mapping isn’t one-to-one, but there’s enough shared signal to learn.

If imaging is your “gold standard,” transcriptomics can be your “proposal.” MorphDiff tries to turn that proposal into a visual preview you can act on.

What MorphDiff actually does (and why diffusion is a good fit)

MorphDiff generates multi-channel microscopy images conditioned on gene expression. It learns from datasets where both are available, then uses only the L1000 vector to generate plausible post-perturbation morphology.

Two modes that match real lab workflows

1) Gene-to-image (G2I): Start from noise, then generate an image that matches the transcriptomic signature.

  • Useful when you don’t have a “before” image or you want a clean synthetic sample

2) Image-to-image (I2I): Start from a control image (vehicle/control condition), then transform it into the predicted perturbed state using the transcriptomic condition.

  • Useful when you want relative change from a specific baseline
  • The paper uses an SDEdit-style procedure so this can work without retraining, which matters operationally

The architecture in plain terms

MorphDiff combines two components:

  • Morphology VAE (MVAE): Compresses 5-channel microscope images into a compact latent space and reconstructs them with high perceptual fidelity.
  • Latent Diffusion Model: Runs diffusion in latent space (faster, less memory) and uses attention over the L1000 gene vector at each denoising step.

Diffusion earns its keep here because biological imaging data is noisy and multi-modal (the same perturbation can produce a distribution of morphologies). Diffusion models are naturally good at modeling distributions rather than single “average” outputs.

“Pretty pictures” aren’t enough: how biological fidelity is tested

The hard part isn’t image realism—it’s scientific usefulness. A synthetic cell image that looks plausible but breaks downstream analysis is worse than useless because it can mislead.

The paper’s evaluation goes beyond visual metrics and asks: Do generated images preserve the signals biologists and screening teams actually use?

Generative quality on unseen perturbations

MorphDiff is benchmarked against GAN and diffusion baselines on standard metrics (e.g., FID, Inception Score, coverage/density, and CLIP-based CMMD). Across large drug and genetic datasets (including JUMP-style genetic perturbations and CDRP/LINCS-style drug perturbations), the model ranks at or near the top—particularly on out-of-distribution (OOD) perturbations, which is where this becomes valuable in the real world.

That OOD point is the whole story. If a model only recreates what it has seen, it’s a dataset compression trick. If it generalizes to new drugs and edits, it becomes a prioritization engine.

Feature-level validation: CellProfiler-style morphology features

Here’s the more convincing evidence: the authors extract hundreds of interpretable morphology features (textures, intensities, granularity, cross-channel correlations) and compare distributions.

They report that over 70% of generated feature distributions are statistically indistinguishable from real ones (as evaluated in their setup). More importantly, the model captures:

  • Which features change most under strong perturbations
  • Directionality vs control (not just “different,” but different in the right way)
  • Correlation structure between gene expression and morphology features

That last bullet matters because it suggests the model is learning relationships rather than painting a style.

Downstream utility: MOA retrieval that approaches real imaging

MOA retrieval is a practical test: given a query perturbation profile, can you find reference drugs that share the same mechanism?

MorphDiff-generated morphology profiles:

  • Beat prior image-generation baselines
  • Beat transcriptome-only retrieval
  • Approach retrieval performance from real images

The paper reports an average improvement of 16.9% over the strongest baseline and 8.0% over transcriptome-only in top-k retrieval experiments (across several k values and metrics like mean average precision).

This is the best argument for synthetic morphology: it adds information that transcriptomics alone doesn’t capture, enough to help cluster mechanisms even when structures differ.

Where this fits in “AI & robotics transforming industries”

Most companies get AI transformation backwards. They chase flashiest demos instead of bottlenecks.

In drug discovery and biotech, the bottleneck is often physical: plates, microscopes, lab throughput, and the humans required to run and interpret everything. MorphDiff sits right at the junction of AI and lab automation:

  • AI generates a high-confidence shortlist of what to run
  • Robotics and lab automation execute the validated experiments faster
  • Humans spend time on interpretation, not brute-force screening

Think of this as decision automation rather than lab replacement. The microscope still matters. But you stop using it as a first pass for every single candidate.

Practical ways teams can use transcriptome-to-morphology models

The fastest ROI comes from using synthetic morphology as a triage layer, not as a substitute for validation. Here are workflows I’d actually advocate for a screening or translational team.

1) Imaging budget prioritization (the obvious win)

If you have a large perturbation library with L1000 signatures but limited imaging capacity:

  1. Generate predicted morphologies for all candidates
  2. Convert them into morphology embeddings/features
  3. Cluster and rank by:
    • similarity to known mechanisms
    • novelty (far from existing clusters)
    • predicted strength of phenotype (distance from control)
  4. Image only the top candidates per cluster for confirmation

This aligns with how industrial automation creates value: reduce wasted runs and maximize utilization of expensive instruments.

2) “Explain the delta” for hit follow-up

The I2I mode is tailor-made for hit triage meetings.

Instead of showing only a gene signature plot, you can show:

  • control image
  • predicted perturbed image
  • the implied feature shifts (e.g., mitochondria intensity up, ER texture altered)

That’s not proof, but it’s a sharper hypothesis. And sharper hypotheses move faster through validation.

3) Mechanism hunting for structurally diverse libraries

Traditional similarity searches struggle when compounds are structurally unrelated. Phenotypes can cut across chemistry.

A practical play:

  • Use synthetic morphology to build a “phenotypic neighborhood map”
  • Flag compounds that land near validated reference mechanisms
  • Prioritize wet-lab follow-up for unexpected neighbors (often where repurposing opportunities hide)

4) Fill gaps in multimodal datasets

Many organizations have patchy datasets: some conditions have images, others only transcriptomics.

Synthetic morphology can help standardize the feature space for modeling—as long as you label synthetic vs real and avoid training/evaluating in ways that leak bias. Treat generated data as an assistive modality, not ground truth.

Limitations you should take seriously (and how to plan around them)

Diffusion inference is still slower than most people want. If you’re generating at scale, you’ll care about sampler efficiency and batching.

Other constraints from the paper are operationally important:

  • Time and concentration aren’t explicit inputs (often missing in paired datasets). In real screening, dose-response and temporal dynamics are the story.
  • You need perturbed gene expression to condition the model. No transcriptome measurement means no MorphDiff prediction.
  • Generalization fades off-distribution. If your cell line, staining protocol, imaging setup, or perturbation types differ too much from training, you’ll see drift.

Here’s the realistic strategy:

  • Start with one cell line + one assay where you already have some paired imaging and transcriptomics
  • Validate synthetic morphology by feature distributions and MOA retrieval, not by eyeballing
  • Deploy as a ranking tool (prioritization), then expand coverage once you’ve measured ROI

A near-term extension the authors point to is chaining: predict gene expression for unseen perturbations (e.g., using gene-expression prediction models) and then feed that into morphology generation. That’s how this becomes a broader “in-silico screening stack.”

What changes in 2026 planning: fewer plates, better questions

If you’re setting budgets for 2026, transcriptome-to-morphology modeling supports a different operating model:

  • Run fewer experiments, but make each one more diagnostic
  • Use AI to propose phenotypic hypotheses early
  • Use robotics to execute confirmatory assays quickly and consistently

The bigger shift is cultural: teams stop treating imaging as a universal first step and start treating it as a targeted validator.

If your organization is already investing in AI for healthcare and drug discovery, this is exactly the kind of capability to pilot: it compresses iteration time, reduces screening spend, and improves hit selection quality—without asking you to abandon experimental reality.

Where does this go next: will your team treat synthetic morphology as a curiosity, or as the decision layer that tells your robots what’s worth running tomorrow morning?