Treasury’s ‘Great Gatsby’ AI Test: What It Signals

AI in Government & Public Sector••By 3L3C

Treasury’s Gatsby-style AI hiring test shows agencies are prioritizing AI literacy—sometimes awkwardly. Here’s how to assess and hire for AI roles better.

AI in governmentAI hiringAI literacyAI governancePublic sector workforceGenerative AI
Share:

Featured image for Treasury’s ‘Great Gatsby’ AI Test: What It Signals

Treasury’s ‘Great Gatsby’ AI Test: What It Signals

A federal AI job posting asked applicants to write a 10-page, citation-heavy analysis of metaphors in The Great Gatsby—then compress it into a 200-word executive summary, translate both into Spanish and Mandarin, compare themes across other novels in a table, and finally rewrite the essay like a scientific paper.

It’s easy to laugh. But I don’t think the bigger story is “government is weird.” The bigger story is that agencies are struggling to hire for AI roles because the labor market has moved faster than government hiring playbooks. The Gatsby prompt isn’t just quirky—it’s a case study in how public sector employers are trying (sometimes awkwardly) to measure AI literacy, communication skill, and policy fluency in one shot.

This post is part of our AI in Government & Public Sector series, where we track how AI is changing government work—procurement, oversight, service delivery, and yes, hiring.

What the Gatsby assignment is really testing (and why that’s risky)

Answer first: The Gatsby assignment is likely trying to test whether applicants can use generative AI tools to produce, compress, translate, and reformat content—skills that look like “prompting,” but are really workflow design.

On paper, the tasks map neatly to what people do with modern AI systems:

  • Long-form synthesis with citations (can you structure an argument and support it?)
  • Executive summarization (can you communicate to leadership without losing substance?)
  • Translation (can you serve multilingual stakeholders and spot nuance errors?)
  • Comparative analysis table (can you structure information for decision-making?)
  • Scientific rewrite (can you adapt voice and format for different audiences?)

Those are not random outputs. They mirror common government realities: memos to leadership, public-facing comms, cross-agency briefings, and documentation for oversight.

The risk: testing “prompt performance” instead of job performance

Answer first: If the role is about AI strategy, standards, architecture, and governance, then a literature analysis—even AI-assisted—can become a poor proxy for the real work.

A credible AI role in government typically involves tasks like:

  • Evaluating model risk (privacy, security, bias, misuse)
  • Designing human review and audit processes
  • Writing policy and technical controls for AI deployment
  • Working with procurement and legal teams on vendor claims
  • Implementing monitoring for drift, quality, and incidents

A Gatsby-based prompt doesn’t measure those directly. It mainly measures whether someone can produce polished content under constraints.

A public critique in the source story made a strong point: measuring prompt engineering alone is unlikely to match the day-to-day duties of a senior AI specialist. I agree with that critique—and I’d go further: government should avoid turning AI hiring into a content-generation contest.

Why agencies are raising the bar on AI literacy right now

Answer first: The public sector is treating AI literacy as a baseline skill, not a specialist hobby, because AI has moved into core operations—benefits, fraud detection, call centers, policy analysis, and cybersecurity.

As of late 2025, agencies are under pressure from three directions at once:

  1. Demand is exploding. More programs want AI-assisted analysis, triage, and decision support.
  2. Trust requirements are higher. Government can’t ship “move fast and break things” workflows into eligibility determinations or enforcement.
  3. Workforces are constrained. Hiring freezes, attrition, and relocation limits mean agencies have to be selective—and creative—about assessment.

That context makes the Gatsby assignment feel less like a prank and more like an overcorrection: “We need people who can work with AI outputs, fast, and explain them to leadership.”

A hiring reality agencies don’t say out loud

Answer first: Many AI roles in government are “translation roles”—bridging policy, tech, and operations—so writing and synthesis matter as much as coding.

If you’ve spent time around AI governance programs, you’ll recognize the pattern:

  • Leadership asks, “Is this safe? Is it legal? What’s the ROI? What’s the failure mode?”
  • Program teams ask, “How do we use this without slowing down?”
  • Security asks, “Where does data go? Who can access logs? What’s retained?”
  • Oversight asks, “Show me evidence. Show me controls. Show me accountability.”

The person who succeeds isn’t always the best model builder. It’s often the one who can write clearly, quantify risk, and turn messy constraints into workable standards.

So yes—communication tests make sense. The problem is how they’re designed.

Better ways to assess AI candidates than a 10-page literary essay

Answer first: Government should assess AI candidates with job-realistic work samples: policy memos, risk assessments, architecture reviews, and incident response simulations.

Here are four assessment formats that map cleanly to public sector AI work and are harder to “game” with generic prompting.

1) The AI procurement reality check

Give candidates a one-page vendor claim: “Our model is unbiased, secure, and explainable.” Ask them to produce:

  • A list of verification questions (data provenance, evaluation, red-team results)
  • A shortlist of contract language they’d request (audit rights, logging, security controls)
  • A risk-rated decision: approve, approve with conditions, or reject

This tests AI literacy where it matters: skepticism, specificity, and governance instinct.

2) A policy-to-implementation translation exercise

Provide a short AI policy requirement (e.g., “human-in-the-loop required for adverse decisions”). Ask candidates to translate it into:

  • A process diagram of review steps
  • A minimal audit log schema (what must be recorded)
  • A set of acceptance tests

If someone can do that well, they can probably operate in a real agency environment.

3) An AI incident tabletop

Give a scenario: “Model quality drops 20% after a policy change,” or “Sensitive data appears in a prompt log.” Ask:

  • Who needs to be notified (security, privacy, program, leadership)
  • What systems should be paused
  • What evidence is collected
  • How the public communication is handled

This reveals maturity. It also reveals whether the candidate understands government accountability.

4) A constrained generative AI writing test (done right)

If you want to measure AI-assisted writing, do it with government-shaped artifacts:

  • A two-page policy memo with an executive summary
  • A public FAQ written at an 8th-grade reading level
  • A bilingual notice with a review plan to validate translation accuracy

This keeps the “AI literacy” signal while staying relevant.

A strong hiring test doesn’t ask, “Can you produce a lot of text?” It asks, “Can you produce defensible decisions under real constraints?”

What job seekers should learn from this (especially in late 2025)

Answer first: If you’re applying for AI jobs in government, assume you’ll be evaluated on communication, governance thinking, and practical AI operations—not just model knowledge.

Whether the Gatsby assignment was intentional, experimental, or simply misguided, it reflects a broader shift: AI roles are blending technical and administrative skill sets.

Here’s what I’ve found works for candidates targeting public sector AI roles:

  • Bring a portfolio of “government style” writing. One-page briefings, risk notes, decision memos.
  • Demonstrate evaluation literacy. Know how to talk about hallucinations, bias testing, and measurement, in plain language.
  • Show you understand constraints. Data access, privacy rules, procurement cycles, and security reviews aren’t footnotes—they’re the job.
  • Be explicit about human review. Agencies want candidates who assume AI outputs must be verified, logged, and governed.

And if an application asks for a sprawling writing exercise: treat it like a signal. It may indicate the team values communication, or it may indicate the team doesn’t yet know how to assess AI work. Either way, you’ll want clarity during interviews.

What government leaders should take from the Gatsby moment

Answer first: AI hiring needs modernization: clearer role definitions, better assessments, and stronger alignment between job announcements and actual responsibilities.

If you’re leading AI adoption in an agency, hiring is now a delivery risk. A few practical moves help quickly:

Align the assessment with the mission

If the job is “AI strategy and ethical deployment,” then the assessment should include:

  • governance scenarios
  • risk controls
  • documentation and auditability

A creative prompt can be fine. But it must connect to the actual work.

Make AI literacy measurable—not theatrical

Agencies can evaluate AI literacy with small, sharp tasks:

  • detect model errors in a sample output
  • propose a review workflow
  • draft a minimal acceptable use policy for staff

These are 60–90 minute exercises, not multi-day essays.

Respect candidate time (especially for senior roles)

High-friction applications reduce your applicant pool—often filtering out exactly the people you want: those currently leading programs, managing teams, or already working in industry.

In a tight talent market, an overly burdensome process doesn’t “raise the bar.” It narrows the funnel.

Remember the December hiring window reality

Late December is a tough time to recruit. People are on leave, budgets are closing, and family schedules dominate. If your application closes quickly, you’ll mainly reach candidates already watching job boards daily.

If you want broader, more diverse applicants, keep windows open longer and provide clearer work sample expectations.

The bigger trend: AI is becoming a core government competency

Answer first: The Gatsby assignment is a symptom of a larger shift: agencies are moving AI from pilots into operations, and they need people who can govern it responsibly.

This is the real thread connecting the story to the broader digital government transformation agenda. AI is no longer a side experiment. It’s influencing:

  • how agencies write and review policy
  • how they deliver services at scale
  • how they manage risk and oversight
  • how they communicate with the public

That’s why AI literacy keeps showing up in job descriptions—even when hiring teams are still learning how to test for it.

If you’re building an AI program in the public sector, this is the moment to tighten the basics: clear standards, measurable assessments, and repeatable governance workflows. If you’re applying for these roles, prepare to prove that you can do more than generate text—you can make AI accountable.

Where does your agency sit right now: still experimenting with AI tools, or already building the controls to run AI in production?