AI in Finance and FinTech•21 December 2025•By 3L3C

AI in finance is the future—but ROI lags today. Learn why pilots stall and how banks and fintechs can ship reliable AI for fraud, credit, and service.

AI in bankingFinTechGenerative AIFraud and AMLCredit and lendingCustomer operationsAI governance

Featured image for AI in Finance: Why ROI Lags (and How to Fix It)

AI in Finance: Why ROI Lags (and How to Fix It)

Only 15% of executives in a recent Forrester survey said AI improved profit margins over the last year. BCG’s number is even harsher: 5% reported “widespread value.” That gap between belief and results is the story of enterprise AI in 2025—and it’s especially loud in financial services.

Banks and fintechs aren’t short on ideas. They’re short on AI that behaves like software you can trust: consistent, auditable, and integrated with the messy reality of financial data, compliance, and customer expectations. The hype was “easy button.” The reality is more like “new operating model.”

This post is part of our AI in Finance and FinTech series, where we focus on practical adoption: fraud detection, credit scoring, trading, onboarding, and personalised financial experiences. The stance here is simple: AI will reshape finance, but only the institutions that treat reliability as the product will see real ROI.

Why AI ROI is stalling in financial services

AI ROI stalls because generative models are impressive in demos and unpredictable in production. Finance is a production environment.

The RSS story captures the broader enterprise mood: leaders agree AI is the future, but they’re frustrated that it “doesn’t work right now.” That frustration comes from specific failure modes that show up fast in banks and fintechs:

Sycophancy (over-agreeableness): Models try to please, not challenge. That’s a customer delight problem in wine recommendations—and a risk problem in credit, advice, and disputes.
Inconsistency: The same question gets different answers. Finance teams can’t sign off on that.
The “jagged frontier”: A model might excel at complex tasks (math, code) and fail at basic operational ones (dates, entities, definitions, policy recall).
Data format chaos: Financial firms run on a patchwork of core systems, data vendors, spreadsheets, PDFs, and policy wikis. LLMs can “see patterns that don’t exist” when formats drift.

Here’s the uncomfortable truth: most AI programs don’t fail because the model is dumb; they fail because the organisation expects the model to compensate for weak process, weak data, and unclear decision rights.

The finance-specific reasons AI “doesn’t work right now”

Financial services has a tougher bar than most industries. When AI is wrong, it’s not just embarrassing—it can be a compliance incident, a customer harm issue, or a fraud loss.

The reliability gap: finance needs repeatable outputs

In practice, “AI accuracy” isn’t a single number. Banks need:

Repeatability: Same inputs → same outputs (or explainable variance)
Traceability: What data was used, what rules applied, what model version
Escalation paths: Clear handoff to humans when confidence is low
Controls: Access, logging, red-teaming, and monitoring

Generative AI often ships without these baked in. That’s why many firms see plenty of experimentation but limited production scale.

The data gap: finance data isn’t LLM-ready

The RSS article notes how varied formats can trip models into reading false patterns. In finance, that shows up everywhere:

Merchant names differ across systems (e.g., “UBER *TRIP” vs “Uber BV”)
Time windows are ambiguous (“last week” means different things across jurisdictions and reporting calendars)
Customer identity is fragmented (multiple IDs across channels)
Product terms live in PDFs and policy pages with exceptions and footnotes

If you don’t standardise entities and definitions, you’re asking the model to guess. Guessing is exactly what regulators and risk teams won’t accept.

The trust gap: customers still want humans for high-stakes moments

The story mentions call centres “leaning back into humans.” Klarna’s experience is the pattern: AI handles simple tasks, but complex issues escalate.

Finance is full of complex issues:

A disputed transaction with partial evidence
A hardship request with emotional context
A mortgage application with unusual income history
A fraud case where the customer is panicked

A good AI strategy doesn’t replace humans; it protects human capacity for the moments that matter.

What actually works: high-impact, low-lift AI use cases in finance

The fastest path to ROI is to pick problems where (1) value is measurable, (2) failure is containable, and (3) workflows already exist.

1) Fraud operations co-pilots (not fully autonomous agents)

Answer first: A fraud co-pilot works because it reduces investigation time without owning the final decision.

Instead of “AI decides fraud,” use AI to:

Summarise case history across systems
Extract key facts from customer messages and call notes
Draft investigator narratives for SAR/SMR preparation (with strict review)
Recommend next-best actions based on playbooks

ROI shows up as minutes saved per case, faster queue throughput, and better investigator consistency.

2) Compliance and policy Q&A with bounded retrieval

Answer first: Policy Q&A only works when the model is forced to cite internal sources and refuse when sources are missing.

The RSS story highlights models inventing rules when summarising long documents. In finance, that’s a hard stop. The fix is architectural:

Use retrieval over a curated policy corpus
Require quoted snippets + document IDs
Add “refuse to answer” thresholds
Log every question and response for audit

This is where specialised, organisation-specific implementations beat generic chat tools.

3) Credit decision support with explainability and constraints

Answer first: Generative AI shouldn’t make credit decisions; it can make credit workflows faster and more consistent.

Good applications include:

Document intake and classification (bank statements, payslips)
Income and expense extraction with human verification
Drafting credit assessment summaries
Detecting missing documents and inconsistencies

Done right, this speeds origination without turning the LLM into a black-box decider.

4) Customer service triage that prioritises empathy

Answer first: AI should route, summarise, and assist—then hand off cleanly.

Where finance teams get wins:

Auto-classify intent (fee dispute, card replacement, mortgage query)
Prefill CRM fields and case metadata
Suggest compliant response templates
Summarise the conversation so the human doesn’t ask customers to repeat themselves

This directly addresses what the RSS story calls out: empathy is a limiter for end-to-end AI agents. Treat empathy as a design requirement, not a marketing line.

A practical playbook to move from pilots to production

If you’re trying to generate leads or justify budget in 2026 planning cycles, you need a story beyond “we’re experimenting.” Here’s what I’ve found works when financial services teams want real outcomes.

Step 1: Define “value” in operational metrics, not vibes

Pick 2–3 metrics per use case:

Cost-to-serve reduction (minutes per case, calls avoided)
Fraud loss reduction (basis points, prevented losses)
Time-to-decision (credit or disputes)
Quality improvements (rework rate, complaint rate)
Revenue uplift (conversion, retention)

If you can’t measure it, it’s a demo.

Step 2: Put boundaries around what the model is allowed to do

Most companies get this wrong. They let the model improvise.

Set boundaries:

Inputs: what data sources are allowed
Outputs: what formats are permitted (JSON, templates, structured fields)
Actions: what it can trigger (ideally none at first)
Refusals: when it must say “I can’t answer”

This is how you reduce the “jagged frontier” risk.

Step 3: Fix the boring stuff: entities, definitions, and data contracts

AI teams often want to skip data work because it’s slow. In finance, skipping it is how you end up with a model that’s confident and wrong.

Prioritise:

A shared entity layer (customer, account, merchant, product)
Canonical definitions (“arrears,” “default,” “chargeback,” “last business day”)
Versioned data contracts between teams

This is also where many Australian banks are investing: not just models, but the infrastructure and governance that make models usable.

Step 4: Design human-in-the-loop like a product feature

Human review can’t be a vague promise. It needs workflow:

Confidence scoring or validation rules
Exception queues
Clear accountability (who owns the final decision)
Training loops (where feedback goes and how it changes prompts/models)

The RSS piece describes AI companies adding “handholding” and forward-deployed engineering. Financial firms should mirror that internally: AI adoption is a services motion as much as a technology motion.

Step 5: Treat model risk like financial risk

If you already run risk programs, reuse that muscle:

Model inventory and change management
Monitoring drift and performance over time
Incident response playbooks
Access controls and privacy reviews

AI that can’t be governed won’t scale—especially in regulated environments.

What 2026 will reward: less magic, more discipline

Forrester expects companies to delay about 25% of planned AI spending by a year. That’s not an AI winter; it’s a reset toward implementations that survive contact with reality.

In finance and fintech, that reset is healthy. The winners won’t be the teams with the flashiest chatbot. They’ll be the ones who build repeatable workflows, clean up data foundations, and put controls around generative AI so it behaves like a trustworthy system.

If you’re planning your next AI initiative in fraud detection, credit scoring, or customer operations, focus on one question: Where can AI reduce cycle time or errors without taking on the full risk of the final decision? That’s where ROI shows up first.

Want a second opinion on your shortlist of AI use cases—or a sanity check on what it’ll take to get them into production in 2026? That’s a useful conversation to have before budget gets locked.

AI in Finance: Why ROI Lags (and How to Fix It)

AI in Finance: Why ROI Lags (and How to Fix It)

Why AI ROI is stalling in financial services

The finance-specific reasons AI “doesn’t work right now”

The reliability gap: finance needs repeatable outputs

The data gap: finance data isn’t LLM-ready

The trust gap: customers still want humans for high-stakes moments

What actually works: high-impact, low-lift AI use cases in finance

1) Fraud operations co-pilots (not fully autonomous agents)

2) Compliance and policy Q&A with bounded retrieval

3) Credit decision support with explainability and constraints

4) Customer service triage that prioritises empathy

A practical playbook to move from pilots to production

Step 1: Define “value” in operational metrics, not vibes

Step 2: Put boundaries around what the model is allowed to do

Step 3: Fix the boring stuff: entities, definitions, and data contracts

Step 4: Design human-in-the-loop like a product feature

Step 5: Treat model risk like financial risk

People also ask: common AI-in-finance questions (answered directly)

Is generative AI safe for banking customer communications?

Should banks build or buy AI solutions?

Why do AI pilots stall after a successful proof of concept?

What 2026 will reward: less magic, more discipline