China’s race to replace Nvidia chips is reshaping AI infrastructure. Here’s what energy and utilities teams should do to build resilient, portable AI compute.
AI chip race: what energy utilities should do now
The most expensive part of many AI programs isn’t the data science team—it’s the compute. And in late 2025, compute has become geopolitical.
China’s biggest tech companies are racing to replace Nvidia GPUs with domestic AI accelerators. On the surface, that sounds like a semiconductor story. For energy and utilities leaders, it’s a capacity-planning story: where your AI workloads can run, what they’ll cost, and how reliably you can scale them.
I’ve seen too many grid AI initiatives get blocked for a surprisingly basic reason: the infrastructure roadmap assumes one vendor stack, one programming model, and predictable chip supply. That assumption is now fragile. If you operate cloud, data centers, or hybrid environments that support grid optimization, demand forecasting, renewable integration, or predictive maintenance, you should treat the AI chip race as a near-term operational risk—and an opportunity to build more resilient compute.
China’s push away from Nvidia is really about control
Answer first: China’s move to reduce Nvidia dependence is reshaping the AI infrastructure landscape, and energy companies should plan for multi-accelerator environments.
For more than a decade, Nvidia GPUs have powered much of China’s AI ecosystem. Even under export controls, China still bought “China-compliant” variants (like the H800, A800, and H20). By 2025, distrust accelerated—state media allegations about security risks and regulatory scrutiny pushed large buyers toward domestic options.
Whether you agree with the politics or not, the operational lesson is straightforward: AI infrastructure is now part of national critical infrastructure policy. Energy utilities already live in that world—NERC CIP, supply-chain audits, vendor risk management, sovereign cloud requirements. AI compute is joining the same category.
For utilities and energy producers, this shows up in three ways:
- Supply risk: GPU access can change fast due to policy shifts.
- Lock-in risk: software ecosystems can be as restrictive as hardware availability.
- Cost volatility: chip scarcity or forced migrations can inflate training and inference costs.
The “Nvidia replacement” problem is harder than most people think
Answer first: Replacing Nvidia isn’t about matching teraflops; it’s about matching the whole stack—memory, interconnects, software, and production scale.
Nvidia’s advantage isn’t just raw compute. It’s the combination of:
- High-bandwidth memory capacity and speed (often the limiting factor for large models)
- Interconnect bandwidth (chip-to-chip and node-to-node scaling)
- Mature software tooling (CUDA, libraries, debuggers, kernel tuning, community knowledge)
- Manufacturing volume and integration patterns (validated server designs, networking, cooling)
This matters directly in energy workloads. Grid and asset AI isn’t one workload—it’s a portfolio:
- Training: forecasting models, fault detection models, foundation models for operations documents
- Inference at scale: real-time anomaly detection, dispatch recommendations, call-center copilots
- Simulation & optimization: probabilistic load flow, contingency analysis, renewables siting
Some of those are memory-bound, some are latency-bound, and some are network-bound. So when a region shifts chip suppliers, the first break isn’t accuracy—it’s throughput, cost per run, and time-to-retrain.
Who’s building the domestic alternatives—and why it matters to data centers
Answer first: Huawei, Alibaba, Baidu, and Cambricon are building chips plus software ecosystems; that combination will change cloud and colocation options in Asia.
The IEEE Spectrum reporting highlights four main contenders. Their strategies map neatly onto how future AI data centers may be built and sold.
Huawei: winning by scaling out, not matching one GPU
Huawei’s Ascend roadmap is explicit about performance targets and, importantly, cluster-scale design.
- Ascend 950 (2026 target): aims for ~1 petaflop FP8, with 128–144 GB memory and up to 2 TB/s interconnect bandwidth.
- Ascend 910B: roughly comparable to Nvidia A100-era capability.
- Ascend 910C: dual-chiplet approach; Huawei has demonstrated large Atlas clusters.
- Atlas SuperPoD approach: thousands of chips linked into rack-scale “supercomputing clusters,” including a 2026 plan for 8,192 chips and 8 exaflops FP8 performance.
For energy and utilities, the key point isn’t which spec wins. It’s that Huawei is betting on “if one chip is behind, use more chips”—and sell you the full system.
That’s relevant because utilities increasingly buy outcomes (AI service levels, model update cadence, inference SLAs), not chips. A vendor that can deliver predictable cluster throughput—especially for training and batch simulation—can be attractive even if the single-accelerator peak numbers trail Nvidia.
Alibaba: protecting cloud margins with in-house AI accelerators
Alibaba’s direction is very “cloud provider logic”: control your infrastructure costs and supply.
- Earlier inference chip: Hanguang 800 (2019), focused on efficient inference.
- Newer training-grade direction: a PPU chip positioned as a rival to Nvidia’s H20, with high-bandwidth memory (reported 96 GB) and modern I/O (PCIe 5.0).
- Infrastructure packaging: upgraded “supernode” servers with 128 AI chips per rack and liquid cooling.
If you run utility workloads in public cloud (or use a cloud-like internal platform), Alibaba’s pattern is a preview of what more providers will do: pair proprietary accelerators with purpose-built racks and cooling to reduce cost per token / cost per simulation.
For energy AI programs, that can be good news—if your software is portable.
Baidu: proving it can train large models on its own clusters
Baidu’s Kunlun line evolved from inference-leaning accelerators to visible cluster-scale claims in 2025.
- A reported 30,000-chip cluster based on third-gen P800 processors.
- P800 performance reported around 345 TFLOPS FP16, in the A100/910B neighborhood.
Baidu’s message is important: “We can train our own models at scale on non-Nvidia hardware.”
In energy and utilities, this mirrors what you’re likely to do internally: keep mission-critical models running even if your preferred GPUs are delayed, restricted, or too expensive.
Cambricon: a market signal that domestic chips can become profitable
Cambricon’s comeback is as much about commercialization as performance.
- Stock performance: reported nearly 500% rise over 12 months.
- MLU 590 (2023): reported 345 TFLOPS FP16 and support for FP8, with claims that in some scenarios it can rival the H20.
Energy leaders should pay attention to commercialization because it impacts long-term support, procurement viability, and ecosystem stability. A chip that looks good in a lab but can’t ship in volume is irrelevant to data center planning.
What this means for AI in energy: grid intelligence will become more localized
Answer first: The chip race accelerates localized AI processing—closer to the grid edge and inside regional data centers—because sovereignty, latency, and resilience are now first-class requirements.
Here are four concrete implications for AI in energy and utilities:
1) Grid optimization will shift toward regional “AI capacity blocks”
Training large forecasting or dispatch models may concentrate in fewer regional hubs where accelerators are available and compliant. Think of it as AI capacity blocks that you reserve the way you reserve transmission capacity.
Practical effect: your AI platform team needs to model capacity in GPU-hours per week (or accelerator-hours), not “number of servers.”
2) Real-time analytics will favor inference-first architectures
Even if training is constrained, utilities can still deliver value with strong inference:
- transformer-based equipment log summarization
- computer-vision inspections for vegetation management
- anomaly detection on SCADA and PMU streams
That pushes you toward architectures where:
- models are distilled/quantized
- inference runs on mixed accelerators
- retraining cycles are less frequent but more predictable
3) Renewable integration workloads will demand better interconnects
High-renewables grids require faster simulation loops and probabilistic planning. Those workloads aren’t just compute-heavy; they’re communication-heavy across nodes. The reporting’s emphasis on interconnect bandwidth (TB/s and PB/s at cluster scale) matters because network fabric becomes the limiter.
If your data center network is designed for traditional enterprise traffic, you’ll pay for it in AI cluster underutilization.
4) Predictive maintenance becomes a portability test
Predictive maintenance models should be the easiest AI win. Yet they often become portability nightmares because teams bake in one vendor’s libraries.
If you can run the same vibration/thermal imagery models across different accelerators without re-architecting, you’re in a far stronger position when supply changes.
The practical playbook: 7 moves energy AI leaders should make in 2026 planning
Answer first: Treat accelerator diversity like fuel diversity—design for it, test it, and contract for it.
-
Define your “compute bill of materials.” Separate workloads into training, batch inference, real-time inference, and simulation. Put target latency, throughput, and memory needs on paper.
-
Adopt a multi-accelerator software strategy. If your stack assumes CUDA everywhere, you don’t have a strategy—you have a dependency. Build portability into your model packaging and deployment pipelines.
-
Standardize model interfaces and observability. You want the same monitoring and rollback behavior whether inference runs on GPUs, NPUs, or CPUs.
-
Pressure-test interconnect and storage. AI accelerators don’t save you if your data pipeline can’t feed them. Measure utilization and identify bottlenecks in network fabric and storage I/O.
-
Plan for cooling and power density now. Liquid cooling and high chip-per-rack counts aren’t niche anymore. If you’re building or retrofitting data halls, design for higher rack densities and variable load profiles.
-
Contract for capacity, not brand names. Procurement language should focus on measurable outcomes (availability, throughput, support SLAs, security requirements), not a single manufacturer.
-
Run “migration drills.” Pick one non-critical model (say, a substation anomaly detector) and prove you can redeploy it to a different accelerator class with minimal rework.
A useful rule: if switching accelerators takes longer than retraining the model, your MLOps pipeline is too brittle.
Where this goes next: AI infrastructure becomes a competitive advantage for reliability
Utilities don’t get credit for owning flashy hardware. They get credit for keeping the lights on. The AI chip race is pushing the industry toward a reality where compute resiliency is part of operational resiliency—right alongside spare transformers, mutual aid, and blackstart plans.
In the “AI in Cloud Computing & Data Centers” series, we often talk about workload management and infrastructure optimization. This is the flip side: geopolitically-driven hardware constraints that force better architecture.
The energy companies that win in 2026–2028 will be the ones that can say, with a straight face: we can run our core AI workloads on more than one accelerator stack, in more than one region, without pausing operations.
So here’s the forward-looking question worth taking to your next platform steering meeting: If your preferred AI chips disappeared for 12 months, which grid and asset AI capabilities would you lose—and which ones would keep running?