AI in finance is the future—but ROI lags today. Learn why pilots stall and how banks and fintechs can ship reliable AI for fraud, credit, and service.

AI in Finance: Why ROI Lags (and How to Fix It)
Only 15% of executives in a recent Forrester survey said AI improved profit margins over the last year. BCG’s number is even harsher: 5% reported “widespread value.” That gap between belief and results is the story of enterprise AI in 2025—and it’s especially loud in financial services.
Banks and fintechs aren’t short on ideas. They’re short on AI that behaves like software you can trust: consistent, auditable, and integrated with the messy reality of financial data, compliance, and customer expectations. The hype was “easy button.” The reality is more like “new operating model.”
This post is part of our AI in Finance and FinTech series, where we focus on practical adoption: fraud detection, credit scoring, trading, onboarding, and personalised financial experiences. The stance here is simple: AI will reshape finance, but only the institutions that treat reliability as the product will see real ROI.
Why AI ROI is stalling in financial services
AI ROI stalls because generative models are impressive in demos and unpredictable in production. Finance is a production environment.
The RSS story captures the broader enterprise mood: leaders agree AI is the future, but they’re frustrated that it “doesn’t work right now.” That frustration comes from specific failure modes that show up fast in banks and fintechs:
- Sycophancy (over-agreeableness): Models try to please, not challenge. That’s a customer delight problem in wine recommendations—and a risk problem in credit, advice, and disputes.
- Inconsistency: The same question gets different answers. Finance teams can’t sign off on that.
- The “jagged frontier”: A model might excel at complex tasks (math, code) and fail at basic operational ones (dates, entities, definitions, policy recall).
- Data format chaos: Financial firms run on a patchwork of core systems, data vendors, spreadsheets, PDFs, and policy wikis. LLMs can “see patterns that don’t exist” when formats drift.
Here’s the uncomfortable truth: most AI programs don’t fail because the model is dumb; they fail because the organisation expects the model to compensate for weak process, weak data, and unclear decision rights.
The finance-specific reasons AI “doesn’t work right now”
Financial services has a tougher bar than most industries. When AI is wrong, it’s not just embarrassing—it can be a compliance incident, a customer harm issue, or a fraud loss.
The reliability gap: finance needs repeatable outputs
In practice, “AI accuracy” isn’t a single number. Banks need:
- Repeatability: Same inputs → same outputs (or explainable variance)
- Traceability: What data was used, what rules applied, what model version
- Escalation paths: Clear handoff to humans when confidence is low
- Controls: Access, logging, red-teaming, and monitoring
Generative AI often ships without these baked in. That’s why many firms see plenty of experimentation but limited production scale.
The data gap: finance data isn’t LLM-ready
The RSS article notes how varied formats can trip models into reading false patterns. In finance, that shows up everywhere:
- Merchant names differ across systems (e.g., “UBER *TRIP” vs “Uber BV”)
- Time windows are ambiguous (“last week” means different things across jurisdictions and reporting calendars)
- Customer identity is fragmented (multiple IDs across channels)
- Product terms live in PDFs and policy pages with exceptions and footnotes
If you don’t standardise entities and definitions, you’re asking the model to guess. Guessing is exactly what regulators and risk teams won’t accept.
The trust gap: customers still want humans for high-stakes moments
The story mentions call centres “leaning back into humans.” Klarna’s experience is the pattern: AI handles simple tasks, but complex issues escalate.
Finance is full of complex issues:
- A disputed transaction with partial evidence
- A hardship request with emotional context
- A mortgage application with unusual income history
- A fraud case where the customer is panicked
A good AI strategy doesn’t replace humans; it protects human capacity for the moments that matter.
What actually works: high-impact, low-lift AI use cases in finance
The fastest path to ROI is to pick problems where (1) value is measurable, (2) failure is containable, and (3) workflows already exist.
1) Fraud operations co-pilots (not fully autonomous agents)
Answer first: A fraud co-pilot works because it reduces investigation time without owning the final decision.
Instead of “AI decides fraud,” use AI to:
- Summarise case history across systems
- Extract key facts from customer messages and call notes
- Draft investigator narratives for SAR/SMR preparation (with strict review)
- Recommend next-best actions based on playbooks
ROI shows up as minutes saved per case, faster queue throughput, and better investigator consistency.
2) Compliance and policy Q&A with bounded retrieval
Answer first: Policy Q&A only works when the model is forced to cite internal sources and refuse when sources are missing.
The RSS story highlights models inventing rules when summarising long documents. In finance, that’s a hard stop. The fix is architectural:
- Use retrieval over a curated policy corpus
- Require quoted snippets + document IDs
- Add “refuse to answer” thresholds
- Log every question and response for audit
This is where specialised, organisation-specific implementations beat generic chat tools.
3) Credit decision support with explainability and constraints
Answer first: Generative AI shouldn’t make credit decisions; it can make credit workflows faster and more consistent.
Good applications include:
- Document intake and classification (bank statements, payslips)
- Income and expense extraction with human verification
- Drafting credit assessment summaries
- Detecting missing documents and inconsistencies
Done right, this speeds origination without turning the LLM into a black-box decider.
4) Customer service triage that prioritises empathy
Answer first: AI should route, summarise, and assist—then hand off cleanly.
Where finance teams get wins:
- Auto-classify intent (fee dispute, card replacement, mortgage query)
- Prefill CRM fields and case metadata
- Suggest compliant response templates
- Summarise the conversation so the human doesn’t ask customers to repeat themselves
This directly addresses what the RSS story calls out: empathy is a limiter for end-to-end AI agents. Treat empathy as a design requirement, not a marketing line.
A practical playbook to move from pilots to production
If you’re trying to generate leads or justify budget in 2026 planning cycles, you need a story beyond “we’re experimenting.” Here’s what I’ve found works when financial services teams want real outcomes.
Step 1: Define “value” in operational metrics, not vibes
Pick 2–3 metrics per use case:
- Cost-to-serve reduction (minutes per case, calls avoided)
- Fraud loss reduction (basis points, prevented losses)
- Time-to-decision (credit or disputes)
- Quality improvements (rework rate, complaint rate)
- Revenue uplift (conversion, retention)
If you can’t measure it, it’s a demo.
Step 2: Put boundaries around what the model is allowed to do
Most companies get this wrong. They let the model improvise.
Set boundaries:
- Inputs: what data sources are allowed
- Outputs: what formats are permitted (JSON, templates, structured fields)
- Actions: what it can trigger (ideally none at first)
- Refusals: when it must say “I can’t answer”
This is how you reduce the “jagged frontier” risk.
Step 3: Fix the boring stuff: entities, definitions, and data contracts
AI teams often want to skip data work because it’s slow. In finance, skipping it is how you end up with a model that’s confident and wrong.
Prioritise:
- A shared entity layer (customer, account, merchant, product)
- Canonical definitions (“arrears,” “default,” “chargeback,” “last business day”)
- Versioned data contracts between teams
This is also where many Australian banks are investing: not just models, but the infrastructure and governance that make models usable.
Step 4: Design human-in-the-loop like a product feature
Human review can’t be a vague promise. It needs workflow:
- Confidence scoring or validation rules
- Exception queues
- Clear accountability (who owns the final decision)
- Training loops (where feedback goes and how it changes prompts/models)
The RSS piece describes AI companies adding “handholding” and forward-deployed engineering. Financial firms should mirror that internally: AI adoption is a services motion as much as a technology motion.
Step 5: Treat model risk like financial risk
If you already run risk programs, reuse that muscle:
- Model inventory and change management
- Monitoring drift and performance over time
- Incident response playbooks
- Access controls and privacy reviews
AI that can’t be governed won’t scale—especially in regulated environments.
People also ask: common AI-in-finance questions (answered directly)
Is generative AI safe for banking customer communications?
Yes—when communications are constrained to approved knowledge sources and templates, and when escalation to humans is built into the flow.
Should banks build or buy AI solutions?
Buy for speed, build for differentiation. In practice, most institutions use a hybrid: vendor models plus in-house orchestration, data layers, and controls.
Why do AI pilots stall after a successful proof of concept?
Because proofs of concept optimise for “wow.” Production optimises for reliability, governance, integration, and measurable value.
What 2026 will reward: less magic, more discipline
Forrester expects companies to delay about 25% of planned AI spending by a year. That’s not an AI winter; it’s a reset toward implementations that survive contact with reality.
In finance and fintech, that reset is healthy. The winners won’t be the teams with the flashiest chatbot. They’ll be the ones who build repeatable workflows, clean up data foundations, and put controls around generative AI so it behaves like a trustworthy system.
If you’re planning your next AI initiative in fraud detection, credit scoring, or customer operations, focus on one question: Where can AI reduce cycle time or errors without taking on the full risk of the final decision? That’s where ROI shows up first.
Want a second opinion on your shortlist of AI use cases—or a sanity check on what it’ll take to get them into production in 2026? That’s a useful conversation to have before budget gets locked.