VA’s $37B EHR program highlights cost and rollout risks. Here’s how practical AI improves oversight, readiness, and safer batched deployments.

VA EHR Costs: AI Tactics to Cut Risk in Deployments
The VA’s electronic health record modernization is a familiar government story: the mission is clear, the stakes are high, and the complexity has a way of multiplying the minute deployment plans meet clinical reality. Lawmakers recently put a sharper point on two concerns—total cost and batched go-lives—as the Department of Veterans Affairs prepares to resume deployments of its Oracle Health EHR.
Here’s the number that frames the whole conversation: VA’s latest lifecycle estimate is about $37 billion, after earlier projections (including a 2022 analysis) put the potential total closer to $50 billion. And after pausing most rollouts in April 2023, the system has reached 6 of 170 VA medical centers so far. When you’re spending at that scale and moving at that pace, “Are we ready?” isn’t political theater—it’s basic governance.
This post is part of our AI in Government & Public Sector series, and I’m going to take a stance: large EHR programs don’t fail because they lack effort; they fail because they lack operational feedback loops that are fast enough to match clinical work. That’s where practical AI (not hype) can help—especially with cost transparency, deployment readiness, and patient safety controls.
What the VA EHR fight is really about: trust, not software
Answer first: The current scrutiny isn’t mainly about Oracle Health vs. legacy systems—it’s about whether VA can prove the program is controllable: financially, operationally, and clinically.
At the House Veterans’ Affairs Technology Modernization Subcommittee hearing, lawmakers raised two issues that should sound familiar to anyone working federal modernization:
- Unclear total cost and cost breakdowns. A top complaint was that lifecycle estimates need to be broken down into understandable buckets—program office costs, consulting, infrastructure, maintenance—so oversight isn’t reduced to a single headline number.
- Deployment strategy risk. VA is planning a near-simultaneous go-live at four Michigan facilities in April 2026, with leaders also signaling higher-volume rollouts later (including an ambition of 26 sites in 2027).
There’s also a quiet third issue underneath both: credibility. If decision-makers don’t trust the cost model or the readiness signals, funding becomes harder, talent attrition increases, and operational teams get stuck in defensive posture instead of improvement mode.
In government health IT, trust is built the boring way: measurable performance, transparent reporting, and a repeatable delivery approach.
Batched deployments can work—if you treat them like a control problem
Answer first: Batched go-lives raise the risk of overload in support, training, and defect resolution; AI can reduce that risk by predicting demand, prioritizing fixes, and monitoring safety signals in near real time.
VA’s stated plan is a “market-based approach” where multiple medical centers go live together. That can be smart. You can standardize training, reuse configuration patterns, and share hard-won lessons across sites.
But batching also creates a fragile moment: if your command center, help desk, interface engines, and clinical trainers are sized for one site at a time, four sites at once becomes a multiplication of failure modes, not a scaling of efficiency.
Where deployments typically break (and what AI can do about it)
Below are the usual breakdown points for large clinical go-lives and the AI-enabled tactics that actually fit federal realities:
-
Ticket storms and bottlenecked triage
- Problem: Users flood support channels; triage becomes inconsistent; priority defects get buried.
- AI tactic: Use case clustering to group similar tickets, route them to the right resolver group, and flag “same root cause” issues quickly. This is less about flashy generative AI and more about practical classification and similarity detection.
-
Training gaps disguised as “software defects”
- Problem: Confusing workflows lead to workarounds and errors; support queues fill with how-to questions.
- AI tactic: Analyze help desk logs and in-app usage to identify where users hesitate, backtrack, or abandon workflows, then push targeted micro-training for those specific screens/tasks.
-
Interface and data migration friction
- Problem: Labs, imaging, pharmacy, referrals, community care—interfaces fail in edge cases.
- AI tactic: Apply anomaly detection to HL7/FHIR message flows and interface error logs. You don’t need perfect predictions; you need early warning and fast root-cause isolation.
-
Patient safety signal dilution
- Problem: Clinicians report “something feels off,” but signals are scattered across reports, notes, orders, and incident systems.
- AI tactic: Build a safety signal aggregator that monitors near-miss patterns (e.g., order retractions, duplicate orders, unusual override rates) and escalates when thresholds move.
One quotable line I’ve learned to trust: A batched deployment is a stress test you volunteered to take in production. If you’re doing it, you need instrumentation that can keep up.
Cost transparency is a systems problem—and AI can make it auditable
Answer first: Lifecycle cost estimates fail when they’re disconnected from operational data; AI-enabled cost models can tie spending to measurable work and outcomes, making oversight more concrete.
Lawmakers’ frustration about cost isn’t just “spending bad.” It’s that the spending isn’t legible. A single number (whether $37B or $50B) doesn’t tell you:
- What portion is fixed vs. variable
- What costs decline after each go-live vs. what costs stack
- Whether the program is getting more efficient per deployment
A practical framework: from “lifecycle estimate” to “cost per capability”
Instead of defending one big estimate, agencies do better when they can show cost and progress in units that match how care is delivered.
Examples of capability-based cost tracking for an EHR program:
- Cost to deploy core outpatient workflows per medical center
- Cost to stabilize within 30/60/90 days post-go-live
- Cost per interface family (lab, pharmacy, radiology) brought to target reliability
- Cost per clinician trained to proficiency (measured, not assumed)
AI can help here by automating the messy part: joining data across finance, contracting, labor hours, ticketing, and operational metrics. When those sources stay siloed, your “cost model” becomes a PowerPoint argument. When they’re joined, it becomes something you can audit.
If you’re leading a public sector modernization program, aim for this standard:
Every major cost line should map to a measurable delivery artifact and an operational performance indicator.
That’s how you stop “blank check” critiques before they start.
“Contingency plans” can’t be a hallway conversation
Answer first: For high-stakes health IT, contingency plans must be pre-decided, measurable, and rehearsed; AI can support “go/no-go” decisions by turning readiness into a scorecard.
A telling moment in the hearing: when asked about what happens if the Michigan go-lives fail, the response sounded like a discussion to be had later. That’s not unusual, but it’s not acceptable at this risk level.
A real contingency plan for EHR deployment answers three questions in advance:
- What triggers a delay?
- Who has authority to call it?
- What happens operationally the next morning if the system is unstable?
An AI-ready go/no-go checklist (what to measure)
This is where AI in government can be quietly powerful: turn readiness from vibes into signals.
Pre-go-live readiness indicators that can be quantified:
- Ticket burn-down rate and recurrence (are old defects resurfacing?)
- Training completion plus proficiency checks (simulations, observed tasks)
- Interface error rates under peak-load tests
- Clinical workflow simulations (time-to-order, time-to-document, time-to-discharge)
- Downtime procedures readiness (measured drills, not policy memos)
AI can then weight these into a readiness score—not to replace leaders, but to give leaders a defensible, consistent basis for decisions.
A stance worth stating plainly: If you can’t explain your go/no-go decision with numbers, you don’t have a go/no-go process.
Where AI actually fits in federal health modernization (and where it doesn’t)
Answer first: AI helps most when it reduces uncertainty in operations—support load, safety signals, cost drivers—not when it’s treated as a branding layer on top of unfinished governance.
Government health organizations are under pressure to “do AI,” especially heading into 2026 budget justifications and acquisition planning. The temptation is to bolt a generative assistant onto the EHR and call it transformation.
That’s backwards.
For a program like VA’s, AI value shows up in three down-to-earth places:
1) Operational intelligence for deployments
- Forecast help desk demand by role and clinic
- Recommend staffing levels for command centers
- Detect recurring defects across sites
2) Safety and quality monitoring
- Identify abnormal ordering patterns post-go-live
- Detect documentation drop-offs that correlate with clinician burnout
- Surface clusters of near-miss incidents earlier
3) Financial accountability and oversight
- Automate cost allocation from contract line items to delivered capabilities
- Track cost per deployment wave and show learning curves (or the lack of them)
- Flag spend anomalies that require leadership attention
Where AI doesn’t help: papering over broken processes. If training is weak, data governance is unclear, or clinical leadership isn’t empowered, AI will simply help you fail faster.
What public sector leaders should do before the next big go-live
Answer first: The best way to reduce EHR deployment risk is to treat the next go-live as a measurable experiment with hard gates, strong telemetry, and a transparent cost model.
If you’re overseeing (or selling into) federal digital transformation, here’s what works in practice:
-
Publish a cost breakdown that a non-IT leader can challenge
- Include consulting, infrastructure, support, training, and ongoing operations.
-
Define a “stabilization SLO” for the first 90 days
- Example: ticket response time targets, system uptime targets, interface reliability targets.
-
Instrument clinical workflows, not just servers
- Track time-to-complete for core tasks and compare pre/post go-live.
-
Stand up a safety signal dashboard with escalation rules
- Make it clear who gets paged and what triggers escalation.
-
Run a simulated contingency drill
- Practice downtime procedures and recovery like it’s real—because it will be.
These steps don’t require miracles. They require discipline.
What the VA case teaches the rest of government about AI and modernization
The VA EHR program is a high-profile test of whether federal agencies can modernize mission-critical systems while staying accountable on cost and safety. The specifics are VA-specific, but the pattern is universal: modernization is easy to approve and hard to operationalize.
If you’re working in the AI in Government & Public Sector space, take the lesson seriously: AI is most useful when it tightens the feedback loop between what you spend, what you deploy, and what patients experience. That’s where confidence comes from—inside the agency and on the Hill.
If you’re planning batched deployments in 2026, here’s the forward-looking question that matters more than your slide deck: What will you know in week one that you didn’t know in day one—and how fast can you act on it?