Google Cloud’s December 2025 updates show AI moving into cloud operations. See what’s new for agent engines, GPUs, GKE inference, and security.

AI Ops in Google Cloud: What December 2025 Updates Mean
December release notes rarely read like a story—until you zoom out. Google Cloud’s latest updates (spanning databases, Kubernetes, security, networking, and AI tooling) point to one clear direction: AI is moving from “something you build” to “something that runs your cloud.”
For teams responsible for cloud operations and data center outcomes—uptime, cost, governance, throughput—this matters because the bottleneck isn’t features anymore. The bottleneck is operational decision-making at scale: Which model should run where? How do we keep GPU capacity predictable? How do we let agents touch production data without creating a security incident?
Below is a practical read of the December 2025 Google Cloud updates through the lens of this series—AI in Cloud Computing & Data Centers—with a focus on infrastructure optimization, workload management, and intelligent resource allocation.
The big shift: AI moves into the control plane
The most important change isn’t any single release note. It’s the pattern: AI capabilities are getting embedded where platform decisions are made—in databases, API gateways, IAM, monitoring, and Kubernetes.
Three examples from this month make that obvious:
- Vertex AI Agent Engine Sessions and Memory Bank are now GA, and there’s updated pricing with Sessions/Memory Bank/Code Execution starting to charge on January 28, 2026. That pricing change is a signal: Google expects production-grade agent workloads to become normal parts of operations.
- Dataplex Universal Catalog adds natural language search (GA). That’s not just “nice UX.” It’s foundational for governance at scale, because discovery becomes feasible for more people (and more agents).
- Apigee API hub supports Model Context Protocol (MCP) and Cloud API Registry is in Preview to discover/govern MCP servers and tools. That’s the missing plumbing for agentic systems: you can’t scale agents without treating tools like APIs.
If you’re building an AI-enabled operations layer, this matters because tool governance becomes the new API governance.
What to do next
If you own platform strategy, add these two questions to your 2026 planning:
- Where will “agent tools” live and be governed? (API hub + MCP is the direction.)
- How will you observe and budget agent behavior? (Sessions/Memory Bank billing + trace/logging is the direction.)
Data centers and compute: getting serious about capacity predictability
Most companies get caught flat-footed on accelerators. They plan models, then discover procurement-by-API is still procurement.
Google Cloud made several moves that directly address this:
Future reservations for GPUs/TPUs become operational tooling
Compute Engine now supports future reservation requests in calendar mode (GA) to reserve GPU/TPU/H4D resources for up to 90 days. This is exactly what you want for:
- pre-training windows
- fine-tuning sprints
- batch inference backfills
- HPC runs
It turns “hope capacity exists” into “capacity is scheduled,” which is a real step toward predictable AI infrastructure.
Sole-tenancy expands to more GPU machine types
Sole-tenancy support expanded for A2 Ultra/Mega/High and A3 Mega/High GPU machine types. For regulated environments, dedicated capacity is often the difference between being allowed to run AI workloads and being blocked.
AI Hypercomputer: node health prediction is GA
AI Hypercomputer adds node health prediction for AI-optimized GKE clusters, helping avoid scheduling on nodes likely to degrade in the next five hours.
That’s an operationally meaningful horizon. Five hours is long enough to prevent avoidable failures, but short enough to be actionable for schedulers.
Snippet-worthy takeaway: Predictive scheduling beats reactive incident response—especially on large GPU fleets.
A4/B200 firmware interruption warning
There’s also a very real-world note: A4 VM workloads might see interruptions due to a firmware issue for NVIDIA B200 GPUs, and Google recommends resetting GPUs at least once every 60 days.
This is the kind of operational detail that can quietly destroy a training pipeline if you ignore it.
What to do next
If you run GPU clusters, do these three things this quarter:
- Adopt calendar-mode future reservations for any training job that has a fixed start date.
- Add “GPU reset cadence” (where applicable) into your fleet maintenance automation.
- Pilot node health prediction on interruption-sensitive training workloads.
Databases become agent surfaces (and that’s both powerful and risky)
The release notes show Google pushing AI directly into the data layer:
Data agents for AlloyDB, Cloud SQL, and Spanner (Preview)
You can now build data agents that interact with database data using conversational language (Preview) across:
- AlloyDB for PostgreSQL
- Cloud SQL for MySQL
- Cloud SQL for PostgreSQL
- Spanner
This is an explicit invitation to turn databases into tool backends for agents.
I like the direction, but I’ll be blunt: most organizations will ship this unsafely the first time.
Here’s what a safer “agent-to-database” path looks like:
- start with read-only, narrow schemas
- enforce row-level security (where possible) and strict IAM
- log prompts and tool calls (treat them like privileged access)
- define a “safe query surface” (views, stored procedures, or constrained functions)
AlloyDB: Gemini 3 Flash for in-database AI functions
AlloyDB now supports Gemini 3.0 Flash (Preview) for generative AI functions like AI.GENERATE using gemini-3-flash-preview.
This is part of a broader trend: the database becomes an AI runtime, not just storage.
Query plan stability and performance regression control
AlloyDB also introduced query plan management (Preview) to prevent query plan regressions.
This fits the “AI in cloud ops” theme more than it seems: performance regressions are a major operational tax. Features that keep performance stable reduce the need for constant tuning—exactly what you want when workloads are changing rapidly because teams are adding AI features.
What to do next
If you’re considering “AI inside the database,” start with these guardrails:
- Create an agent-access schema separate from production OLTP tables.
- Require approvals for any agent that can write data.
- Use enhanced backups (Cloud SQL enhanced backups are GA) with longer retention if agents will affect workflows.
Kubernetes and serving: inference gets a first-class gateway
If you’re serving models on Kubernetes, the December updates include a big one.
GKE Inference Gateway is GA (production-ready)
GKE Inference Gateway reached GA with changes that directly impact cost and latency:
- API moved to stable v1 and introduces
InferenceObjective(replacingInferenceModel) with a migration path. - Prefix-aware routing to keep related requests on the same replica, improving KV-cache hit rates.
- Google claims Time-to-First-Token latency improvements up to 96% in conversational-style traffic when routing preserves prefixes.
- API key authentication via Apigee integration.
- Body-based routing for OpenAI-compatible request formats.
Those are not cosmetic upgrades. They’re about operationalizing inference: stable APIs, predictable routing, and enforceable access control.
Regional Cloud Service Mesh and extension-based routing
Regional Cloud Service Mesh is in public preview, and Service Extensions continue expanding dynamic forwarding and authorization options.
This matters because multi-tenant and multi-model routing patterns are increasingly common. The load balancer and mesh layers are becoming places where AI workloads are optimized—before the request even hits the model server.
What to do next
If you’re running production inference on GKE:
- Evaluate whether prefix-aware routing matches your traffic (chat, agentic, session-based workloads).
- Decide where auth should live: gateway layer (Apigee + Inference Gateway) is the cleanest for consistent enforcement.
- Treat your inference gateway configs as infrastructure as code and add regression testing (routing bugs are expensive).
Security and governance: AI increases blast radius, so guardrails tighten
AI expands what’s possible—then security teams ask the right question: “What happens when this is abused?”
This month’s Google Cloud updates include several governance and security improvements that map directly to AI operations.
Cloud KMS: Single-tenant Cloud HSM is GA
Single-tenant Cloud HSM is now GA in us-central1, us-east4, europe-west1, and europe-west4.
For regulated AI workloads (health, finance, public sector), dedicated HSMs support stronger key isolation—especially relevant for:
- model encryption keys
- signing keys for agent tools
- customer-managed key requirements
IAM: Gemini-assisted role suggestions (Preview)
IAM now offers Gemini assistance for predefined role suggestions (Preview), and custom role suggestions through Cloud Assist.
The practical value: less guessing and fewer over-permissioned roles, if used correctly.
The caution: role suggestions are only useful if your org has clear boundaries and naming conventions. Otherwise you’ll get “helpful” recommendations that still don’t match your internal patterns.
VPC Service Controls: violation analyzer is GA
The VPC Service Controls violation analyzer is GA. This is a big deal for teams trying to put AI workloads inside perimeters. Debugging VPC-SC denials is famously painful; better diagnostics reduce time-to-fix.
Model Armor keeps expanding
Model Armor capabilities and integrations show up repeatedly in these notes, including:
- configuration for MCP servers (Preview)
- logging for sanitization operations
- broader availability signals
This aligns with the reality that prompt injection and data exfiltration aren’t theoretical once agents can call tools.
What to do next
If you’re deploying agentic AI in cloud environments:
- Put VPC-SC and IAM reviews in front of agent rollouts, not after.
- Log and retain agent interactions like audit logs.
- Decide whether you’ll standardize on a safety layer (Model Armor-style) for inbound/outbound content.
A practical “what should I pay attention to?” shortlist
If you don’t have time to track everything, here are the updates most likely to impact AI in cloud operations in the next 60–90 days:
- GKE Inference Gateway GA (serving performance + auth + OpenAI compatibility)
- Compute Engine calendar-mode future reservations GA (capacity predictability)
- Vertex AI Agent Engine Sessions/Memory Bank GA + charging starts Jan 28, 2026 (budgeting + architecture)
- Dataplex natural language search GA (governance and discovery)
- Database data agents Preview (high leverage, high risk)
- Single-tenant Cloud HSM GA (regulated AI security posture)
Where this is heading in 2026
The trend line is clear: cloud providers are building an AI-native operations stack where scheduling, routing, capacity planning, and governance get assisted (or automated) by AI.
The teams that win won’t be the ones who “use more AI.” They’ll be the ones who:
- standardize tool governance (MCP, API hubs, registries)
- make capacity predictable (reservations, fleet health)
- treat agent actions like production changes (auditability, blast-radius control)
If you’re building your 2026 roadmap for AI in cloud computing and data centers, a good next step is to identify one workflow—GPU scheduling, inference routing, or agent-to-data access—and harden it end-to-end with the new platform capabilities.
What would you automate first if your goal was fewer incidents and lower cloud spend: capacity planning, incident triage, or inference routing?