OpenAI’s compute margin rose to ~70%, but B2B SaaS still faces rising cost-per-task. Learn routing, pricing, and infra tactics to protect margins.

AI Compute Margins: Why B2B SaaS Still Feels Squeezed
OpenAI reportedly pushed its compute margin to ~70% by October 2025, up from roughly 35% in early 2024. That jump matters because it signals something founders and operators in U.S. tech have been waiting for: the AI infrastructure layer is learning how to run more like software and less like an expensive services business.
But here’s the part most teams find out the hard way. Even if the model providers get healthier, application-layer B2B startups can still watch their gross margins stall—or even get worse. The reason isn’t mysterious. It’s math: per-token prices may drop, while tokens-per-task keeps climbing.
This post is part of our “AI in Cloud Computing & Data Centers” series, where we look at how AI workloads reshape infrastructure decisions—from GPU capacity planning to energy efficiency to the unit economics that decide which products survive.
OpenAI’s 70% compute margin is real—and still easy to misread
Answer first: A higher compute margin at the foundation layer doesn’t mean AI apps will automatically become high-margin businesses.
A compute margin is a narrow slice of the picture: revenue minus the direct infrastructure cost to serve inference (and sometimes training allocations, depending on how a company reports). When that number rises from ~35% to ~70%, it typically reflects a mix of:
- Better utilization (higher GPU occupancy, less idle capacity)
- Systems optimization (batching, caching, quantization, kernel tuning)
- Model efficiency (smarter architectures and serving strategies)
- Commercial improvements (enterprise commitments and steadier demand)
At the data center level, this is exactly what you’d expect as AI workloads mature: schedulers improve, inference stacks optimize, and hardware procurement gets less chaotic. The U.S. cloud ecosystem—hyperscalers, colos, and private GPU clouds—has been steadily building muscle here.
The mistake is assuming that your AI SaaS product inherits those gains by default.
The “older models get cheaper” pattern vs. “frontier tasks get pricier”
Answer first: Inference cost declines show up fastest on established models; frontier behavior often pushes cost per outcome higher.
Teams love a clean story: “tokens are getting cheaper, so margins will expand.” That story only holds if:
- Your product can stick with older/cheaper models, and
- Your users don’t demand the newest reasoning-heavy features
In practice, the market pulls you forward. Customers buying AI for real work (support, sales, compliance, engineering) want fewer mistakes, better tool use, and more autonomy. Those improvements often mean more steps, more context, more calls, more tokens.
The treadmill problem: why cost per task rises even when token prices fall
Answer first: The unit that matters for B2B apps is rarely “cost per token.” It’s cost per completed task.
Agentic workflows are the biggest reason AI feels like a treadmill for B2B startups. A workflow that used to be “one prompt, one completion” becomes:
- retrieve context (RAG)
- ask clarifying questions
- draft output
- run tool calls (CRM, ticketing, codebase, docs)
- validate
- revise
- produce final output
Each step is more compute. Multiply by millions of tasks per month and you get the uncomfortable truth:
Per-token costs can go down while total inference spend goes up.
One widely cited pattern in 2024–2025: reasoning-oriented models may emit 10× more tokens for the same apparent output quality because they “think” in text or run multi-step processes. You pay for that behavior—directly.
Why this hits U.S. B2B SaaS especially hard
Answer first: U.S. B2B buyers reward “better outcomes,” not “cheaper tokens,” so vendors compete on capability and absorb cost.
In the U.S. enterprise software market, procurement and renewal conversations aren’t usually about how efficient your inference stack is. They’re about:
- accuracy and compliance
- time-to-resolution
- conversion lift
- reduced headcount pressure
- auditability and governance
So vendors race to ship more advanced features. That race often increases inference load faster than pricing changes.
If you’re selling into competitive categories (AI SDRs, AI support agents, AI coding assistants), the treadmill speeds up. Standing still looks like falling behind.
Gross margin reality at the application layer: the uncomfortable benchmarks
Answer first: Early AI application companies often run materially below traditional SaaS gross margins—and some go negative.
Classic SaaS economics are forgiving: marginal cost trends toward zero. AI software doesn’t behave like that unless you actively force it to.
Recent datasets and investor discussions in 2025 have shown patterns such as:
- fast-growing AI apps starting around ~25% gross margin
- more disciplined companies trending closer to ~60%
- many cases of negative gross margins when usage is heavy and pricing is flat
Traditional SaaS investors expect 75%+ gross margin. That gap changes everything:
- valuation expectations
- payback periods
- CAC limits
- how aggressively you can scale
This is where “AI in cloud computing & data centers” becomes more than infrastructure talk. Your product strategy and your GPU bill are now coupled.
The hidden “AI tax” spreading across software
Answer first: Even non-AI-native SaaS is seeing margin compression as AI features become table stakes.
Plenty of established SaaS vendors are adding copilots, summarization, and workflow automation. They’re discovering a new variable cost line item that behaves more like telecom minutes than web hosting.
The response we’re seeing across U.S.-based digital services is predictable:
- blended pricing (subscription + usage)
- tiering based on “AI actions” or “AI credits”
- add-ons for premium models
If you still price AI as “unlimited,” you’re choosing to subsidize power users.
What actually improves AI gross margins (without pretending you’re OpenAI)
Answer first: The winning playbook is operational discipline: routing, product design, and pricing that matches your cost curve.
Most teams can’t build their own foundation model stack. That doesn’t mean you’re stuck. It means you need to act like an infrastructure-aware SaaS company.
1) Model routing: stop paying frontier prices for routine work
Answer first: Route the cheapest model that clears your quality bar; reserve frontier models for exceptions.
Model routing is the practical bridge between product quality and cloud cost optimization. You can route by:
- intent (simple Q&A vs. multi-step reasoning)
- user tier (free vs. enterprise)
- risk (regulated output vs. internal draft)
- confidence (self-check scores, eval thresholds)
A simple routing layer often produces immediate savings because a large share of requests don’t need the most expensive reasoning model.
Operational tip I’ve found useful: track “% of requests sent to frontier” weekly, like you’d track cloud spend. If that percentage drifts up over time, your margins will quietly erode.
2) Engineer for fewer tokens, not prettier prompts
Answer first: Token minimization is a product feature, not just an infra tweak.
High-impact moves include:
- compressing context (summaries, embeddings, structured memory)
- reducing chatty agents (hard limits on turns and retries)
- caching frequent answers
- using structured outputs to cut verbose completions
In data center terms, you’re doing demand shaping: reducing peak load rather than only adding capacity.
3) Price for outcomes and variability (and be explicit about it)
Answer first: If your costs scale with usage, your pricing has to scale with usage too.
A workable approach in B2B is:
- base subscription for platform access
- included usage credits/allowances
- overage charges for heavy usage
- higher-priced tiers tied to premium models, faster latency, or larger context
This doesn’t have to feel punitive. Buyers accept usage-based pricing when it maps to value: “AI actions,” “tickets resolved,” “documents reviewed,” “repos indexed.”
4) Build margin beyond tokens: attach high-margin services to AI value
Answer first: The healthiest AI businesses earn margin on the platform around the model, not on the model call itself.
Think of AI as the entry point into higher-margin layers:
- workflow management
- integrations and admin controls
- data connectors and governance
- hosting, storage, deployment, monitoring
- marketplace take rates
If your business model is “API cost + markup,” you’re vulnerable to:
- provider price changes
- competitors with better procurement
- customers negotiating you down
A stronger model is “AI drives adoption; platform drives retention and margin.”
5) The nuclear option: partial verticalization (the realistic version)
Answer first: Most startups shouldn’t train frontier models, but many can fine-tune or serve smaller open models for common cases.
A practical middle ground:
- fine-tune open models for narrow tasks
- host them on reserved capacity
- keep frontier APIs for edge cases
This is where cloud infrastructure choices matter. Reserved instances, GPU leasing, and region selection can materially change your cost per task—especially at scale.
How this ties back to cloud computing and data centers
Answer first: AI margins are now a cloud architecture problem as much as a finance problem.
If you’re building AI-powered digital services in the U.S., your “gross margin plan” is also your:
- GPU capacity plan (on-demand vs reserved vs hybrid)
- latency plan (regional serving, caching layers)
- reliability plan (fallback models, multi-provider strategy)
- energy/cost plan (utilization targets, batch windows)
The infrastructure layer is getting more efficient—OpenAI’s reported compute margin improvement is a strong signal of that. Yet the application layer only benefits when it adopts the same discipline: measure, route, constrain, and price.
Practical checklist: a 30-day margin reset for AI B2B teams
Answer first: You can usually find savings and pricing clarity in a month if you instrument the right metrics.
- Define “cost per task” for your top 3 workflows (not cost per request).
- Add routing rules so the default isn’t your most expensive model.
- Set hard caps on agent loops (max turns, max tool calls, max retries).
- Ship caching for repeated prompts and boilerplate outputs.
- Introduce allowances (credits) and overages for heavy usage.
- Create a margin dashboard: % frontier usage, tokens per task, cost per task, gross margin by customer tier.
If you only do one thing: separate “quality evals” from “production defaults.” Many teams test with frontier models and accidentally keep them as the default forever.
Where AI gross margins go next in the U.S. digital economy
Foundation-layer compute margins improving is a genuine milestone. It suggests AI infrastructure providers are learning to run more efficiently inside modern data centers, which reduces risk for every business that depends on those APIs.
But for B2B startups, the treadmill doesn’t stop unless you design your product and pricing around it. I’m opinionated here: the winners won’t be the teams with the fanciest model demos—they’ll be the teams that control cost per task while shipping outcomes customers will pay for.
If you’re building AI-powered software in 2026, what’s your plan when your users demand “deeper reasoning” and your token-per-task doubles again—do your margins break, or does your system adapt?