Train an AI-ready SOC like a triathlete: improve telemetry coverage, standardize evidence, then apply AI for faster triage and confident response.

Train Your AI-Ready SOC Like a Triathlete
Most SOCs don’t have an “AI problem.” They have an evidence problem.
By late 2025, it’s normal for security teams to have at least one AI-assisted workflow in production—LLM-powered investigation notes, automated enrichment, summarization in the SIEM, or a vendor’s “AI analyst” layer. And yet a weird pattern keeps showing up: teams spend real money on AI, but mean time to understand (MTTU) barely improves, analyst fatigue stays high, and leadership still hears too many variations of “we can’t prove it.”
Here’s the stance I’ll defend: AI makes a SOC faster only after you make the SOC measurable. The cleanest way I’ve found to explain it is the triathlon metaphor—swim, bike, run. Not because it’s cute, but because it forces you to think about inputs, consistency, and endurance as a system.
This post is part of our AI in Cybersecurity series, and the goal is practical: how to build an AI-ready security operations center that can handle today’s edge-device exploits, living-off-the-land attacks, and long-dwell intrusions without burning out your team.
Swim: Build observable coverage that AI can trust
If your data retention is two weeks, your SOC is training for a sprint while attackers are running a marathon.
The “swim” portion of SOC fitness is readiness through observation. Alerts are a starting gun, not a finish line. When an alert triggers, the analyst (and any AI assisting them) needs enough high-quality evidence to answer basic questions quickly:
- What happened first?
- What else did the same actor touch?
- Is this normal for this user/host/service?
- What’s the blast radius?
What “good” looks like: coverage + time
Most teams feel like they have broad coverage until they measure it. When you actually do the math—what percentage of endpoints are logging correctly, what percentage of cloud control planes are in scope, how much east-west traffic is visible—the number is often lower than expected.
A practical target for many mid-to-large environments:
- 90–95% environment coverage for your core telemetry (network + endpoint + identity + cloud)
- 6–12 months of searchable telemetry for investigations and baselining
That time window isn’t arbitrary. Attackers routinely sit inside environments for weeks or months. If you can’t see “before,” you can’t establish what’s abnormal “now.” And AI can’t magically infer missing evidence; it will just generate confident-sounding narratives from partial truth.
A simple “Swim Score” you can implement this quarter
If you want an answer-first metric your leadership will understand, start here:
- Retention depth (days): For each core data source, how many days are searchable?
- Coverage breadth (%): What percent of assets/users/accounts actually produce usable events?
- Query latency (seconds): How long does it take an analyst (or AI tool) to retrieve context?
Create a one-page dashboard: four rows (network, endpoint, identity, cloud), three columns (retention, coverage, latency). You’ll expose your real constraints fast.
Where AI fits in the Swim stage
AI helps most in swim-readiness when it’s used to validate and monitor telemetry health:
- Detecting logging gaps (sudden drops in event volume by asset class)
- Identifying drift in sensor configs
- Flagging “quiet” segments that should never be quiet
That’s not glamorous work, but it’s high-leverage. If the data feed is unreliable, your AI-assisted SOC becomes a confidence factory for the wrong conclusions.
Bike: Standardize evidence so investigations don’t wobble
In triathlon terms, biking is about holding your line. In SOC terms, it’s about consistency and connectedness.
The most expensive SOC minutes are often wasted on translation:
- “Source” means the process on the endpoint, but the peer on the firewall.
- A single user exists as
j.smith,jsmith@company.com, and a GUID in identity logs. - An IP rotates every few hours in cloud environments.
If humans struggle to reconcile that, AI will struggle too—except it will do it at scale and amplify the confusion.
The AI-ready fix: canonical definitions + entity stitching
Your SOC needs a shared “grammar” for evidence:
- Standardize definitions: source, destination, session, principal, object, action, outcome
- Normalize timestamps: consistent time zones, clock drift checks, consistent precision
- Entity resolution: reliably map IP ↔ host ↔ user ↔ device ↔ workload
When you do this well, you stop investigating “logs.” You investigate stories—sessions, chains of activity, and blast radius.
Treat evidence like a product, not exhaust
A lot of logs were born as diagnostic output. They’re helpful for troubleshooting a tool, but lousy for security investigations. If your SOC relies on 10–15 disconnected log sources, the team spends energy correlating instead of deciding.
What works better is designing evidence as a product:
- Fewer event schemas
- Cross-linked records (session IDs, user IDs, device IDs)
- Clear provenance (where did this fact come from?)
This is also where AI becomes genuinely useful: LLMs are strong at narrative synthesis, but only when the underlying facts have stable meaning.
A “Bike Plan” for cross-tool AI integration
If your campaign goal is an AI-powered SOC, your integration plan should be explicit:
- Pick your canonical entities: user, device, workload/service, session
- Enforce naming rules: one primary identifier + approved aliases
- Create a correlation contract: every event should include at least two entities (example: device + user)
- Align access controls: AI tools must see what analysts see, with auditable permissions
That last point matters. AI workflows fail quietly when permission boundaries hide critical context. The model outputs become vague because the evidence is missing, not because the model is “bad.”
Run: Use AI to turn evidence into confident decisions
The “run” stage is where SOCs either mature—or stall. Running well means you can move from “we think” to “we know” under pressure.
Confidence is not a vibe. It’s the product of evidence quality, correlation discipline, and repeatable workflows.
The metric I trust most: “cause unknown” closures
If you want one SOC fitness metric that cuts through vanity reporting, track this:
- % of cases closed as “cause unknown”
Every time that number drops, your SOC is getting stronger. You’re preserving evidence long enough, correlating it well enough, and investigating consistently enough to determine root cause.
Pair it with two operational metrics:
- Containment time: time to stop spread / isolate the actor
- Reopen rate: cases reopened due to missing context or wrong conclusion
AI should reduce cause-unknown closures and reopen rates. If it doesn’t, it’s probably being fed inconsistent or incomplete inputs.
A real-world decision pattern: ransom claims vs. exfil reality
During ransomware events, attackers often exaggerate what they stole. A SOC with strong evidence can quantify the gap between claims and reality. If you can prove that only a small fraction of data was actually exfiltrated, leadership can negotiate—or refuse—based on facts, not fear.
That’s “run fitness”: decision-grade confidence.
Where AI belongs in the Run stage (and where it doesn’t)
AI shines when it’s applied to bounded, high-volume work:
- Tier 1 alert triage (grouping, deduplication, prioritization)
- Investigation copilots (timeline building, hypothesis generation)
- Enrichment summaries (what’s this binary/hash/domain associated with?)
- Compliance evidence assembly (who had access, when did config change, who approved it)
AI is a poor substitute for:
- Missing telemetry
- Undefined schemas
- Broken identity mappings
- A SOC that hasn’t agreed on what “done” means
A blunt rule: don’t try to fix coverage gaps and deploy new AI workflows in the same month. You’ll end up debugging two moving targets.
Training for the threats attackers actually use in 2026
Attackers go where you’re weak, and right now that weakness is often the messy middle between “edge compromise” and “quiet persistence.”
Across 2025, defenders have been hammered by a familiar chain:
- Initial access via edge devices (VPNs, gateways, SSO integrations, exposed management planes)
- Living-off-the-land movement using built-in tools and legitimate credentials
- Low-and-slow discovery and staging that looks like normal admin work
Traditional malware-centric alerting often misses step 2 entirely.
What an AI-ready SOC does differently
An AI-ready SOC wins here by combining baselining + anomaly detection + human verification:
- Baselining normal authentication paths (who logs into what, from where, with what device posture)
- Detecting identity anomalies (impossible travel, unfamiliar token use, new MFA patterns)
- Spotting lateral movement in network evidence (unusual session patterns, new internal destinations)
- Using LLMs to accelerate the “what does this mean” loop for the 2–5% of alerts that are truly weird
The point isn’t to automate everything. It’s to reserve human cognition for decisions that matter.
A 90-day triathlon plan for an AI-powered SOC
If you want a plan you can execute without replatforming your entire stack, this is it.
Days 1–30: Swim (readiness)
- Inventory your core telemetry sources (network, endpoint, identity, cloud)
- Measure actual coverage % and retention days
- Fix the top two gaps that block investigations (commonly identity logs and endpoint visibility)
- Set a minimum retention goal (start with 90 days if budget is tight, then expand)
Deliverable: a one-page Swim Score dashboard leadership can understand.
Days 31–60: Bike (consistency)
- Publish a SOC evidence dictionary (your definitions for core fields)
- Normalize timestamps and event schemas where feasible
- Implement entity stitching for user/device/workload
- Reduce “log source sprawl” in investigations by standardizing a primary evidence view
Deliverable: a correlation contract and a reduced investigation playbook.
Days 61–90: Run (AI-assisted execution)
- Choose 1–2 AI workflows with clear ROI:
- Tier 1 triage automation
- Investigation summarization + timeline building
- Compliance evidence automation
- Align permissions so the AI tool can see the same evidence as analysts
- Track cause-unknown, containment time, and reopen rate weekly
Deliverable: a measured AI workflow with before/after metrics.
Snippet-worthy truth: If your SOC can’t explain an incident six weeks later, your AI won’t either.
What to do next if you want leads, not just lessons
If you’re building an AI in cybersecurity roadmap for 2026, the triathlon mindset keeps you honest: coverage first, consistency second, AI acceleration third. That sequencing is the difference between an AI-powered SOC and an AI-themed dashboard.
If you’re not sure where you stand, run a short internal assessment using the Swim Score and the cause-unknown metric. I’ve found that two hours of measurement beats two months of tool debates.
Where is your SOC most out of shape right now—coverage, consistency, or confidence—and what would it take to make that your strongest discipline by the end of Q1?