Agentic AI for Government Work: Lessons for Defense

AI in Government & Public Sector••By 3L3C

Agentic AI can move beyond chatbots to complete government workflows. Learn what State’s rollout teaches defense teams about autonomy, control, and adoption.

Agentic AIFederal ITWorkflow AutomationNational SecurityCybersecurity OperationsChange Management
Share:

Featured image for Agentic AI for Government Work: Lessons for Defense

Agentic AI for Government Work: Lessons for Defense

State Department employees went from 3,000 beta testers to roughly 45,000–50,000 active users of an internal generative AI chatbot—out of a workforce of about 80,000. That adoption curve is the real headline, not the chatbot itself.

Because once people trust an AI assistant to answer questions—“How much leave do I have?”—the next demand is obvious: “Great, now do the thing for me.” That’s exactly the jump the State Department CIO, Kelly Fletcher, described when she talked about moving beyond StateChat into agentic AI: systems that can take actions across multiple tools and workflows on an employee’s behalf.

This post is part of our AI in Government & Public Sector series, and I’m going to take a firm stance: agentic AI will matter more for national security operations than most flashy model demos. Not because it writes better prose, but because it can reduce “administrative drag” that quietly steals hours from mission planning, intelligence analysis, cyber defense, logistics, and diplomatic operations.

Agentic AI is about outcomes, not answers

Agentic AI is most valuable when it’s measured by tasks completed, not prompts answered. A chatbot that explains a policy is helpful; an agent that files the form, routes the approval, updates the record, and confirms completion is operational.

In the State Department example, Fletcher’s point was simple:

“I want it to not only tell me ‘How much leave do I have’… but then I want it to put in my leave slip, which is in a different system.”

That “different system” part is where government and defense workflows usually break. Policy knowledge is centralized enough to search. Execution is scattered across HR platforms, case management systems, ticketing tools, legacy apps, and role-based portals.

Why this matters for defense and national security

Defense and national security organizations have the same friction—just with higher stakes:

  • A planner pulls data from one system, drafts a brief in another, and coordinates approvals in a third.
  • A cyber analyst triages alerts in one console, submits an incident ticket in another, and documents steps in a knowledge base.
  • A logistics officer checks status across multiple dashboards, then emails five people to reconcile mismatches.

Agentic AI is the first mainstream technology that can plausibly span those seams—if it’s engineered with security, auditing, and guardrails from day one.

The boring win: reducing administrative toil at scale

The State Department’s best early use cases weren’t “AI does diplomacy.” They were practical, like helping staff navigate internal rules. Fletcher’s example about moving a pet to an overseas post is funny because it’s true: employees waste time hunting through manuals for niche but urgent requirements.

Administrative toil is a national security issue for one reason: it competes directly with time for mission work.

Here’s how I’ve seen this play out in real programs: leaders spend months arguing about model accuracy, while the workforce burns thousands of hours each week on tasks that don’t require judgment—just persistence across systems.

Where agentic AI actually pays off

Agentic AI tends to deliver value fastest in workflows that are:

  1. High-volume (lots of repetitions across the workforce)
  2. Rules-based (clear policy or procedural constraints)
  3. Cross-system (the work breaks because tools don’t talk)
  4. Low-risk or controllable-risk (you can add approvals and audit trails)

For government and defense environments, strong candidates include:

  • Leave, travel, training requests, and status checks
  • Policy-guided routing (who needs to approve what, and when)
  • Drafting and packaging routine artifacts (memos, summaries, tickets)
  • Cyber alert triage and prioritization (with human confirmation)
  • Case intake and document classification (with strict access controls)

If you want a simple north star: start where the agent can save 15 minutes per person per week across 10,000 people. That’s 2,500 hours per week—more than 60 full-time equivalents of time returned to mission work.

Adoption is the hard part—and State’s numbers prove it

One of Fletcher’s most practical observations was that State underestimated the effort required to build confidence and habits.

She described having to answer, even a month prior, whether it was allowed to use the tool. That’s not a technical problem; it’s a governance and change-management problem.

The adoption pattern you should expect

Most agencies and defense orgs will see a similar sequence:

  • Phase 1: Permission — “Am I allowed to use this?”
  • Phase 2: Safety — “Will I get in trouble if it’s wrong?”
  • Phase 3: Fit — “When should I use it vs. do it myself?”
  • Phase 4: Trust — “Can it touch my systems, or just advise me?”

Agentic AI ramps the stakes because it moves from content generation to action execution. The moment an AI can submit a form, trigger a workflow, or change a record, users (and auditors) will demand clear answers to:

  • Who approved the action?
  • What data was used?
  • What policy constrained the decision?
  • What changed, exactly?

If you can’t answer those questions automatically, agentic AI won’t scale in a regulated national security environment.

What training looks like when it works

The training that sticks isn’t “here’s how to prompt.” It’s:

  • Role-based playbooks (what to use, when, and why)
  • Three or four blessed workflows (repeatable wins)
  • Clear escalation paths (what to do when the AI is wrong)
  • A feedback loop that visibly improves the system every month

State’s jump to 45,000–50,000 users suggests a key truth: once employees see personal time returned to them, adoption becomes contagious. But you have to get them over the first hump.

Agentic AI changes the security model—plan for it upfront

The reason agentic AI is both promising and risky is the same: it can operate across systems.

In defense and national security, the security model can’t be “the AI has access to everything.” It has to be least privilege, with explicit controls on what actions are allowed.

Practical guardrails for agentic AI in sensitive environments

If you’re designing or buying agentic AI for government workflows, these are non-negotiable:

  • Action gating: the agent can propose actions; humans approve high-impact steps.
  • Strong identity binding: every action is attributable to a user and role, with clear delegation rules.
  • Tool-level permissions: the agent inherits scoped permissions per system, not blanket access.
  • Audit-by-default logs: prompts, tool calls, retrieved documents, and final actions are recorded and reviewable.
  • Policy-as-code constraints: the agent is restricted by explicit rules (what it can file, change, route, or disclose).

A useful mental model is: treat an AI agent like a junior staffer with access to your systems. You wouldn’t give a new hire admin privileges on day one. Don’t do it with an agent either.

Testing is harder than people admit

Traditional software testing checks whether a function returns the expected value. Agentic AI testing must validate whether the agent:

  • followed the approved workflow,
  • used authorized data,
  • asked for approval at the right moment,
  • and produced a correct, auditable action.

That’s why many early agent deployments will succeed in constrained domains (HR, IT helpdesk, policy navigation) before expanding into mission operations.

From State Department workflows to mission operations

The bridge from State’s administrative use cases to defense missions isn’t theoretical. It’s the same workflow pattern with different nouns.

Mission planning: agents as staff accelerators

Mission planning often involves collecting inputs, reconciling versions, routing approvals, and producing briefings. An agent can:

  • pull the latest authoritative data from approved sources,
  • assemble a draft product in the right format,
  • generate a checklist of missing inputs,
  • route to the correct approvers,
  • track responses and update the package.

Humans still own decisions. The agent owns the glue work.

Intelligence analysis: agents as context managers

Analysts drown in documents, not because they can’t read, but because time is limited. Properly governed AI agents can:

  • watch for new items in assigned collections,
  • summarize changes since the last brief,
  • extract entities and link them to existing case files,
  • recommend what needs a human look.

The win isn’t “AI replaces analysis.” The win is AI protects analyst time for the hard calls.

Cybersecurity: agents as alert triage partners

Fletcher mentioned embedding AI to prioritize cybersecurity alerts. That’s where agentic patterns shine:

  • cluster similar alerts,
  • enrich with context (asset criticality, recent changes, known IOCs),
  • open a ticket with a pre-filled narrative,
  • propose containment steps with a required human approval step.

This is how you scale cyber operations without pretending staffing doesn’t matter.

A practical roadmap for deploying agentic AI in government

Most organizations get stuck because they start with a giant vision and no execution path. A better approach is staged autonomy.

Stage 1: “Read-only” copilots (weeks)

Start with assistants that can search, summarize, and draft—using approved internal knowledge bases.

Success metric: weekly active users and time saved per task.

Stage 2: “Suggest-only” agents (1–3 months)

Let agents propose workflows and fill out forms, but require users to click “submit.”

Success metric: reduction in cycle time (how long approvals and submissions take).

Stage 3: “Bounded-action” agents (3–9 months)

Grant limited action permissions for low-risk tasks (e.g., create tickets, schedule meetings, route documents) with full audit logging.

Success metric: task completion rate and exception rate (how often humans had to intervene).

Stage 4: Mission-aligned agents (ongoing)

Expand into mission workflows only after governance, security, and testing maturity is proven.

Success metric: mission throughput (more analyses produced, faster planning cycles, improved triage).

What leaders should do before buying anything

If you’re responsible for AI adoption in defense, national security, or the broader public sector, you’ll save time by forcing clarity early:

  1. Pick one painful workflow with a measurable baseline (cycle time, backlog size, labor hours).
  2. Decide the autonomy ceiling upfront (read-only, suggest-only, or bounded-action).
  3. Define the audit requirement in plain language (who needs to see what, and when).
  4. Plan training as a product, not a one-off webinar.
  5. Budget for integration, because cross-system action is where value lives.

Agentic AI succeeds when it’s treated as an operational system, not an innovation lab experiment.

Where this is headed in 2026—and why you should care now

AI vendors are openly targeting agents that can run multi-hour or multi-day tasks with limited supervision. That timeline isn’t distant. Procurement cycles, accreditation, and workforce training alone can eat a year.

The State Department’s experience offers a clean lesson for our AI in Government & Public Sector series: the fastest path to mission impact is often through the “unsexy” work. Automate the administrative friction first, earn trust, and then expand autonomy into mission workflows.

If you’re building toward agentic AI for mission planning and operational efficiency, the next step is straightforward: identify two workflows—one administrative and one mission-adjacent—where an agent can reduce cycle time without expanding risk. Then prove it with logs, controls, and measurable outcomes.

The question I’d leave you with is the one leaders should be asking heading into 2026 budgeting: Which mission-critical teams are doing “glue work” that an agent could handle—and what would they do with that time back?