OPM’s draft rule could cap top federal ratings and simplify scoring. Here’s how AI can make evaluations fairer, faster, and more evidence-based.

AI-Ready Federal Performance Ratings Under New OPM Rules
A performance system that rates almost everyone as above average isn’t “kind.” It’s low-information. And when a system produces low-information outputs, leaders compensate with workarounds: informal reputations, ad hoc award decisions, and “everyone knows who the real stars are” conversations that never make it into the record.
That’s the backdrop for a draft Office of Personnel Management (OPM) rule reported this week: the Trump administration is preparing to cap top performance ratings for most federal employees and consolidate scoring by reducing the rating scale from five levels to four. The stated goal is straightforward—reduce rating inflation and improve differentiation—but the implementation details matter, especially for agencies already under pressure from hiring constraints, retention challenges, and heightened scrutiny over workforce productivity.
From the lens of our “AI in Government & Public Sector” series, this rule is a useful forcing function. It exposes a hard truth: you can’t standardize performance outcomes without modernizing how performance evidence is gathered and interpreted. Done poorly, forced distribution becomes a demoralizing quota exercise. Done well, it becomes the start of a more credible, data-driven performance management system—one where AI can help supervisors do the job they already wish they had time to do.
What the draft OPM rule changes—and why it’s happening now
Answer first: The draft rule shifts federal performance management toward stricter differentiation by allowing caps on top ratings (a form of forced distribution), simplifying the scale to four levels, and increasing OPM oversight of agency appraisal systems.
Here’s what’s in the reported draft proposal:
- Allow caps on top ratings (4s and 5s). Current rules have treated forced distribution as prohibited, but the draft would remove that prohibition and allow agencies to set limits on how many people can receive top scores.
- Move from a 5-level to a 4-level rating structure by eliminating the “minimally satisfactory” level. The intent is to create clearer separation between acceptable and unacceptable performance.
- Reduce friction for “unacceptable” ratings by allowing supervisors to issue a level-one rating without the higher-level review currently required.
- Increase OPM oversight by requiring agencies to re-seek appraisal system approval every other year, with potential implications for performance award spending if systems don’t meet criteria.
- Exclude certain political appointees (Schedules C and the newer Schedule G) from forced distribution requirements, responding to concerns that political executives could dominate top ratings and award pools.
The policy rationale is rooted in data OPM and oversight bodies have highlighted for years:
- A GAO report cited in the coverage found that in 2013, 99% of employees received at least “fully successful.”
- More than 60% received “exceeds” or “outstanding” ratings (4 or 5 on a 5-point scale) in that analysis.
- In OPM’s more recent look at FY 2022–FY 2024, about two-thirds of non-SES employees reportedly received 4s or 5s, while only 0.6% received below a 3.
- In the 2024 Federal Employee Viewpoint Survey, only 47% agreed that performance differences are recognized meaningfully in their work units—reported as the lowest positive response across the survey’s questions.
The rule also arrives at a tense time: agencies are managing workforce reductions, delayed reorganizations, and heightened demands to show measurable outcomes. In practice, that often means leadership wants stronger performance signals—fast.
The risk: forced distribution can turn performance into a math problem
Answer first: Forced distribution improves performance management only when there’s credible, comparable evidence; without it, caps can become arbitrary and damage trust, retention, and mission execution.
Forced distribution is attractive because it promises differentiation. But differentiation isn’t the same as accuracy.
One critique in the reporting gets to the heart of it: if managers believe 35% of their staff earned “outstanding” but policy permits only 30%, someone will get downgraded to satisfy the curve. That downgrade may not be explainable in mission terms.
Here’s what I’ve found in real organizations: when ratings feel like quotas, people optimize for politics, not performance. In government, that risk is amplified because:
- Work is often team-based and interdependent, making “relative” comparisons tricky.
- Outcomes can be long-cycle (regulations, procurement, grants oversight) rather than immediate.
- Supervisors manage large spans of control, leaving little time for continuous coaching.
- Performance evidence is scattered across systems (tickets, case notes, email trails, document repositories).
The result is predictable: more grievances, less candor, and quieter high performers who decide to leave rather than fight over a point on a scale.
And there’s another second-order effect the draft hints at: supervisors’ compliance with rating rules becomes part of their own appraisals. If managers feel personally penalized for giving “too many” top ratings, they’ll rate down “out of caution,” not because the employee didn’t deliver.
Where AI fits: standardizing evidence, not just scores
Answer first: AI can make federal performance evaluations more objective by turning scattered work signals into consistent evidence summaries, improving calibration, documentation quality, and coaching—without turning evaluations into surveillance.
If agencies are moving toward more standardized performance metrics, they need standardized inputs. That’s the opening for AI.
Used responsibly, AI in government workforce management can help in four practical ways:
1) Evidence capture that doesn’t waste supervisor time
Most federal supervisors aren’t unwilling to document performance; they’re overwhelmed. AI can reduce the administrative load by:
- Summarizing work artifacts (case notes, decision memos, ticket histories, inspection logs)
- Drafting achievement narratives aligned to performance elements
- Flagging missing documentation early in the rating cycle
The goal isn’t to “auto-rate” people. It’s to ensure that when rating season arrives, the supervisor isn’t reconstructing a year of work from memory.
2) Calibration support without turning it into a quota drill
Calibration meetings are where fairness is won or lost. AI can help teams compare like-for-like by:
- Normalizing performance evidence into common categories (quality, timeliness, customer impact, compliance)
- Highlighting outliers (e.g., someone rated top-tier with unusually thin evidence)
- Detecting inconsistent standards across units (the “everyone is a 5 in Branch A” problem)
This matters under a cap system because the conflict shifts from “do we need to differentiate?” to “who has the best-supported case for a top rating?” That’s a healthier argument.
3) Continuous coaching nudges (the part the rule doesn’t solve)
One expert quoted in the article emphasizes the missing ingredient: continuous engagement. AI can help agencies operationalize that by:
- Prompting supervisors with quarterly check-in templates
- Suggesting coaching actions based on observed patterns (missed deadlines, rework, backlog spikes)
- Creating lightweight, auditable notes that support later decisions
A forced distribution system without coaching is mostly punishment. A differentiated system with coaching can be development.
4) Better accountability with guardrails
The draft proposal lowers the barrier to issuing an “unacceptable” rating. That can speed corrective action—but also raises risk if documentation is thin.
AI can help ensure due process by:
- Checking that performance deficiencies map to the right critical elements
- Ensuring the documentation timeline is coherent
- Recommending when a performance improvement plan (PIP) workflow is triggered
That’s not about replacing HR or legal review. It’s about reducing preventable errors that lead to reversals, grievances, or inconsistent treatment.
A practical blueprint for agencies: AI-enabled performance management that won’t backfire
Answer first: Agencies can prepare for rating caps by modernizing performance evidence workflows, building transparent standards, and using AI to improve consistency—not to make final decisions.
If your agency is anticipating changes like these (or already implementing “cap-like” guidance), here’s a workable path that avoids the worst outcomes.
Step 1: Define “top performance” in mission terms
If “outstanding” means “impact,” define impact. Don’t leave it to vibes.
- What outputs count? (cases closed, inspections completed, grants processed)
- What quality thresholds matter? (error rates, rework, audit findings)
- What citizen/customer outcomes are relevant? (timeliness, satisfaction, safety)
Write these definitions in plain language. If the definition can’t be explained to an employee in 60 seconds, it’s too abstract.
Step 2: Build an evidence map before you buy tools
Map where performance signals already live:
- Service/ticketing platforms
- Case management systems
- Quality assurance reviews
- Training and certification records
- Collaboration tools and document repositories
Then decide what AI should summarize vs. what must remain human judgment. AI should support the file, not decide the verdict.
Step 3: Create “minimum documentation standards” for every rating level
Caps increase scrutiny. The easiest way to reduce drama is to standardize what “good evidence” looks like.
For example:
- Top rating: 3–5 documented achievements tied to mission outcomes + peer/customer feedback + quality confirmation
- Meets expectations: consistent delivery against objectives + acceptable quality checks
- Unacceptable: documented deficiencies + coaching attempts + missed improvement milestones
The numbers above are illustrative, but the principle is firm: ratings should be auditable.
Step 4: Put bias and privacy controls in writing
Performance management systems fail when employees feel watched or unfairly scored.
If you introduce AI support, publish clear rules:
- What data is in scope (and what is out of scope)
- How outputs are used (drafting, summarization, consistency checks)
- What AI will never do (final ratings, disciplinary recommendations)
- How employees can challenge factual errors
Trust is a deliverable. Treat it like one.
People also ask: the questions leaders should be ready to answer
Does a cap on top ratings improve performance? A cap improves the distribution of ratings. Whether it improves performance depends on coaching, evidence quality, and perceived fairness.
Won’t this just demoralize employees? It will if it feels arbitrary. It won’t if employees see clear standards, consistent documentation, and real development conversations throughout the year.
Can AI replace supervisors in evaluations? No—and it shouldn’t. The highest-value use of AI is reducing administrative burden and improving consistency so supervisors can spend more time coaching.
What’s the biggest operational risk for agencies? Increased conflict during calibration and awards decisions, especially if documentation practices aren’t mature. AI can help, but only with governance and guardrails.
Where this heads next—and what to do before February planning cycles
This draft rule (and agency experiments already underway) is part of a broader trend: government is pushing for process optimization through standardization, and performance management is one of the most visible places to do it. The uncomfortable part is that performance management is also one of the most human, trust-dependent processes in any organization.
If agencies respond with “just enforce a curve,” they’ll get compliance—plus resentment, attrition, and risk to mission continuity. If they respond by improving the inputs—clear standards, better evidence, and ongoing coaching—they can finally get performance ratings that mean something.
If you’re leading HR, operations, analytics, or digital transformation in the public sector, now is the moment to assess how AI can support performance evaluation systems responsibly: evidence summaries, calibration support, documentation quality checks, and coaching workflows. That’s where AI earns its keep.
Where would your agency benefit most—evidence capture, calibration consistency, or continuous coaching—if rating caps become the new normal?