AI in Defense & National Security•December 19, 2025•By 3L3C

AI accountability can make intelligence failures harder to hide. Learn what Russia’s SVR, GRU, and FSB reveal—and how AI can improve oversight.

AI governanceIntelligence analysisRussiaFSBGRUSVRDisinformation

Featured image for AI Accountability for Intelligence Failures: Russia’s Case

AI Accountability for Intelligence Failures: Russia’s Case

A hard truth from the Ukraine war is that intelligence failure isn’t always punished—sometimes it’s rewarded. When an intelligence service is built to protect a leader rather than test reality, the feedback loop breaks. Bad assumptions survive. Flattering assessments travel upward. Dissent gets treated like disloyalty.

That’s the uncomfortable through-line in expert commentary on Russia’s intelligence services—the SVR, GRU, and FSB—after years of missed calls and poor performance tied to Ukraine. And it’s also why this topic belongs in an “AI in Defense & National Security” series: the same data-rich environment that enables modern intelligence operations can also enable AI-driven accountability, even when institutions resist it.

The point isn’t that AI can “fix” authoritarian politics. It can’t. The point is more practical: AI can make failure harder to hide, improve analytic tradecraft, and create measurable standards for performance—especially for democratic oversight bodies, defense leaders, and national security organizations that actually want to learn.

Why Russia’s intelligence failures won’t be “investigated”

Russia is unlikely to conduct a Western-style after-action review because its intelligence services are not just agencies—they’re pillars of regime survival. The SVR, GRU, and especially the FSB function as a Praetorian system: their first job is protecting the Kremlin’s hold on power.

In that structure, “accountability” is dangerous. If the services are publicly blamed for major failures—like misjudging Ukraine’s resistance, the West’s cohesion, or Russia’s own capabilities—then the myth of state competence cracks. And in Russia, perceived weakness is politically costly.

A second dynamic matters just as much: inter-service rivalry. Russia doesn’t operate like a single integrated intelligence community. The services compete, hoard information, and undermine each other. When the war’s ledger is tallied, the most likely outcome isn’t reform. It’s finger-pointing, selective scapegoating, and internal consolidation of power—particularly by the FSB.

From an AI and oversight perspective, here’s the key insight:

When incentives reward loyalty over accuracy, intelligence failures become a feature of the system—not a bug.

That’s exactly the kind of incentive failure that measurement systems (including AI-driven ones) are designed to expose.

The SVR, GRU, and FSB: different missions, same incentive problem

Russia’s services will tell a “success story” after the war because their careers depend on it. But each service will frame success differently—and those narratives create identifiable data trails that AI systems can evaluate.

SVR: influence operations that mattered “only in the margins”

The SVR’s strongest claim tends to be political warfare: covert influence, propaganda, agent networks, and disinformation—what Russia historically called active measures.

In the Ukraine war, the SVR can point to moments of Western hesitation or political friction. But the measurable outcome is harder to fake: Ukraine didn’t collapse, European support didn’t dissolve, and NATO expanded with Finland and Sweden—expanding the NATO-Russia border and intensifying long-term containment.

Where AI fits: influence operations produce patterns—content themes, amplification networks, bot behaviors, and cross-platform narratives. AI can help democratic institutions measure:

Narrative propagation speed and geographic spread
Coordination signatures across accounts and channels
Which messages correlate with real-world outcomes (votes, protests, policy delays)
The “decay rate” of disinformation after public attribution

This changes the oversight conversation from “Was the SVR effective?” to “What did the operation measurably change, and for how long?”

GRU: sabotage abroad, attrition at the front

The GRU can claim relevance through sabotage and covert action—arson, cable cuts, assassination attempts, intimidation campaigns. These activities can be tactically disruptive, but the strategic goal was larger: deterring Europe and NATO from supporting Ukraine. By that yardstick, the effect looks limited.

The GRU also suffered an internal self-inflicted wound: the war’s early-phase failures burned through elite Spetsnaz capacity. When special operations forces are used as expendable infantry or thrown into failed decapitation attempts, you don’t just lose personnel—you lose institutional experience.

Where AI fits: covert action and battlefield employment create “accountability exhaust”—logistics, movement, communications, procurement, casualty flows, and repair cycles. AI-driven fusion of open-source intelligence and classified collection can help answer:

Did sabotage correlate with measurable supply-chain disruption?
Which units repeatedly failed mission profiles with similar preconditions?
Where did command assumptions diverge from observable readiness indicators?

Put simply: AI can quantify whether “shadow war” actions produced strategic effects or just headlines.

FSB: the worst calls, the strongest position

If one service “owned” the pre-war assumptions about Ukraine, it was the FSB—particularly components responsible for the near abroad and political assessments. The FSB’s failures were profound: misreading Ukrainian resilience, overestimating Russia’s operational readiness, and misjudging NATO and European reaction.

Yet the FSB is positioned to gain influence after the war because it plays the core internal security role: repression, surveillance, political control, and counterintelligence. In regime terms, it’s not optional.

This is the paradox democracies should pay attention to:

The institution most responsible for bad assessments can still become the institution that “investigates” the failure.

That’s not just a Russia problem. Any organization with weak oversight can drift into self-auditing.

What “AI-driven accountability” really means in intelligence

AI-driven accountability isn’t a magic scoring system that declares an agency “good” or “bad.” It’s a governance capability: using machine-assisted methods to track analytic quality, decision latency, uncertainty, and outcome alignment.

Here are four practical models that work in real national security environments.

1) Prediction auditing (track calls, not vibes)

Most intelligence organizations keep archives of assessments, but they often don’t treat them like measurable forecasts. A prediction-audit program does.

An AI-enabled approach can:

Extract claims from finished intelligence (e.g., “Kyiv will fall within X days”)
Tag confidence language and underlying assumptions
Link claims to outcome windows
Score calibration (how often “high confidence” is right)

This is how you turn “we assessed…” into institutional learning.

2) Assumption tracing (find the hidden load-bearing beams)

Big analytic failures usually share a trait: a few assumptions quietly carry the whole narrative.

AI can help by building “assumption maps” across products:

Which assumptions appear repeatedly across teams?
Which assumptions lack collection plans?
Which assumptions persist after contradictory evidence?

This matters because Russia’s problem, as described by experienced observers, wasn’t just missing data. It was a detachment from reality reinforced by internal incentives.

3) Collection-to-impact analytics (was the intel used?)

Intelligence doesn’t fail only because analysis is wrong. It fails when decision-makers ignore it, misunderstand it, or never receive it in a usable form.

AI can track:

Time from collection → processing → dissemination
Which products were briefed vs. read vs. acted upon
Where bottlenecks occurred (translation, validation, classification barriers)

In high-tempo conflicts, minutes and hours matter. AI-enabled workflow telemetry makes delay visible—and fixable.

4) Red-teaming at scale (adversarial thinking as a service)

Human red teams are expensive and limited. AI systems can support red teams by generating structured alternative hypotheses, stress-testing plans, and simulating adversary incentives.

Used responsibly, this can counter a classic failure mode: planning based on wishful thinking.

The oversight challenge: AI won’t create accountability unless leaders demand it

Accountability is political before it’s technical. Russia’s services can avoid a reckoning because the Kremlin’s priority is regime stability, not institutional reform.

Democracies have the opposite risk: they can build advanced AI systems and still fail if they don’t fix incentives and oversight.

If you’re designing AI for defense and national security, I’ve found these principles separate “useful” from “expensive theater”:

Make auditability a requirement, not a feature. If outputs can’t be traced to inputs and assumptions, you’re scaling confusion.
Measure calibration, not confidence. Analysts (and models) can sound certain and be wrong.
Design for contestability. The system should make it easy to challenge a conclusion and see what changes.
Protect dissent. If contrarian analysis gets punished, AI will mostly optimize for institutional comfort.

A direct lesson from Russia’s experience is that a closed feedback loop creates strategic surprise. AI can widen the loop—but only if governance allows it.

What Western defense and intelligence teams should do next

Russia’s likely post-war dynamic—services claiming success, avoiding reforms, and continuing covert pressure—should be treated as a planning assumption for 2026.

Here are concrete steps for organizations building AI in defense and national security programs:

Stand up an “intel performance dashboard” that tracks forecast accuracy, decision timelines, and confidence calibration across mission areas.
Institutionalize AI-assisted red-teaming for war planning, escalation assessment, and gray-zone activity.
Build disinformation measurement pipelines that connect narrative monitoring to real-world indicators (policy delays, protests, polarization metrics).
Create governance rules for model use in analytic production (traceability, versioning, confidence reporting, and human review).
Plan for adaptive Russian tradecraft: more sabotage, more influence operations, more deniable proxies—because the system that enabled failure is still intact.

None of this requires believing AI is a silver bullet. It requires believing something simpler: systems improve when performance is measurable and incentives reward accuracy.

Where this goes in the AI in Defense & National Security series

This post fits a pattern we keep returning to in the AI in Defense & National Security series: the hardest part of adopting AI isn’t model selection—it’s institutional design. AI can strengthen intelligence analysis, speed up collection triage, and support decision-making under uncertainty. But its most underrated role is also the least glamorous: creating durable oversight and accountability.

Russia’s intelligence services show what happens when accountability is structurally impossible. For Western defense leaders and oversight institutions, that’s a warning—and an opportunity. The war won’t end the contest. It will shape the next phase.

If you’re building AI-enabled intelligence workflows, the question worth carrying into 2026 is this: Are you using AI to produce more assessments—or to produce better, testable ones that your organization can learn from?