AI in Finance: Why It’s Not Working (Yet)

AI in Finance and FinTechBy 3L3C

AI in finance is promising but unreliable today. Learn why ROI lags—and the practical workflow-first steps banks and fintechs can take for 2026.

AI ROIGenerative AIBanking OperationsFinTech StrategyAI GovernanceCustomer Experience
Share:

Featured image for AI in Finance: Why It’s Not Working (Yet)

AI in Finance: Why It’s Not Working (Yet)

Only 15% of executives in a Forrester survey said AI improved profit margins over the last year. BCG found just 5% saw widespread value. Those numbers don’t read like a hype cycle; they read like an implementation problem.

If you work in banking or fintech, you’ve probably felt the gap first-hand: leadership is sold on the future of generative AI, but teams are stuck wrestling with messy data, inconsistent model behavior, and the uncomfortable truth that customers still want humans for anything even slightly complicated.

I’m bullish on AI in finance. But I’m also convinced most organisations are approaching it backwards—starting with shiny tools instead of specific workflows, governance, and the “boring” operational plumbing that makes AI reliable. The reality? The winners in 2026 won’t be the companies that bought the biggest AI contract. They’ll be the ones that made AI predictable.

The problem isn’t belief—it’s reliability and ROI

Business leaders aren’t arguing about whether AI matters. They’re frustrated that it doesn’t work consistently enough to justify large-scale rollout.

That pattern shows up across industries, and finance amplifies it. When a wine app’s AI sommelier is “too nice,” it’s a brand quirk. When a bank’s AI assistant is “too nice,” it can become a mis-selling risk, a compliance incident, or a customer remediation bill.

Why ROI is hard to prove in financial services

Generative AI adoption in finance runs into three classic traps:

  1. Value is real but diffuse: If AI saves analysts 20 minutes here and 12 minutes there, it’s helpful—but not always captured in a clean P&L line.
  2. Controls add friction: Model risk management, privacy, and audit trails slow deployment (for good reasons).
  3. The hard costs are obvious: Data work, security reviews, integration, and change management are visible, budgeted, and painful.

This is why many firms are quietly shifting from “big bang transformation” to targeted automation—fraud ops, AML triage, customer service deflection, credit decisioning support—where there’s a measurable baseline and a clear owner.

Snippet-worthy truth: In finance, AI value isn’t “can it answer?” It’s “can it answer correctly, consistently, and in a way we can defend later?”

The ‘jagged frontier’ hits finance harder than most industries

AI can write code, summarise documents, and draft customer emails—then fail at something that feels trivial, like interpreting a time period (“last week”) or mapping a suburb to the right region. Researchers call this the “jagged frontier”: impressive performance in one domain, surprising failure in another.

Finance is full of jagged edges:

  • Inconsistent source data (core banking, CRM, payment rails, collections notes, scanned PDFs)
  • Ambiguous language (“arrears,” “delinquency,” “hardship,” “chargeback,” “dispute”) that varies by product and jurisdiction
  • Context-dependent rules (policy exceptions, thresholds, changing regulations)

A model that “mostly works” is still unacceptable when errors create regulatory exposure.

The data formatting tax (and why AI exposes it)

One of the most practical observations from the source story is that AI can “read patterns that don’t exist” when data is formatted differently across systems.

In financial services, that’s not hypothetical. It’s daily reality:

  • Merchant names appear in multiple forms across card, EFT, and wallet rails
  • Counterparty identifiers differ between systems
  • Product codes and status flags vary across business lines
  • Free-text notes in collections and customer support are messy and emotionally charged

If you want reliable AI for fraud detection, credit scoring support, or personalised financial advice, you end up paying a data reformatting and definition alignment bill first. Many teams try to skip that bill. They don’t skip it—they just postpone it and suffer later.

“We thought it’d be the easy button”: where AI projects go wrong

A recurring theme from executives is blunt: they expected an easy button.

That expectation is especially common in finance because so many AI demos look magical: a chatbot summarises policy documents, a model drafts a customer response, an agent explains a transaction history. Then the pilot hits production constraints:

  • Access controls and entitlement checks
  • PII handling and redaction
  • Audit logs and evidence trails
  • Human review requirements
  • Integration with ticketing, workflow, and approval systems

The most common failure pattern I see

Most teams start with: “Let’s build an AI assistant.”

They should start with: “Which decision or workflow step is expensive, repetitive, and measurable?”

Here’s the difference:

  • Assistant-first leads to open-ended scope, unclear success metrics, and endless edge cases.
  • Workflow-first leads to bounded tasks, clear evaluation, and a realistic control model.

If your use case can’t answer these questions, it’s not ready:

  1. What is the decision the AI influences?
  2. What is the cost of a wrong answer?
  3. What’s the fallback when confidence is low?
  4. Who owns the KPI and signs off on risk?

Customer service is the clearest proof: humans aren’t going away

Payments and telco examples in the source reinforce what financial services teams learn quickly: AI is great at routine interactions, and it struggles with emotional, complex, or ambiguous cases.

In fintech customer operations, the best results typically come from a hybrid model:

  • AI handles authentication steps, basic FAQs, and transaction explanations
  • AI drafts responses for agents (agent-assist)
  • Humans take over for hardship, disputes, fraud trauma, complaints, and complex product questions

This isn’t a retreat from AI. It’s maturity.

Practical takeaway for banks and fintechs

If you want generative AI to reduce contact centre cost without damaging customer experience:

  • Measure containment rate (what % issues are resolved without a human)
  • Track transfer quality (does the AI pass context cleanly to the agent?)
  • Optimise for time-to-human on high-friction cases

One line I agree with strongly: empathy is a hard blocker for fully automated customer conversations right now. In financial services, empathy isn’t just “nice”—it’s often the difference between a complaint and a retained customer.

The better approach for AI in finance: start small, then earn scale

Forrester predicts companies will delay about 25% of planned AI spending by a year. That doesn’t mean AI is failing. It means organisations are learning what “production AI” actually costs.

The playbook that works in finance is not glamorous, but it’s effective.

Step 1: Choose “high impact, low lift” finance workflows

Good early wins share three traits: clear inputs, repeat volume, and measurable outputs.

Examples in AI in finance and fintech:

  • Fraud ops triage: summarise cases, cluster similar patterns, draft SAR narratives for review
  • AML alert enrichment: compile customer and transaction context into a standard analyst view
  • Credit memo drafting: generate first-pass writeups with citations to source fields
  • Collections agent-assist: propose compliant call scripts and next-best actions
  • Customer dispute intake: classify dispute type and gather required evidence checklist

Notice what’s missing: “autonomous decisions.” Early wins are about speed and consistency, with humans still accountable.

Step 2: Design for “No” (and for uncertainty)

The wine app story nailed a subtle truth: models often need permission to be critical.

In finance, you want the model to say:

  • “I don’t have enough information to answer.”
  • “This conflicts with policy X.”
  • “Route to a licensed adviser / compliance / human agent.”

This is not just prompting. It’s product design:

  • Confidence thresholds
  • Guardrails by intent (advice vs info)
  • Retrieval that cites internal policy snippets
  • Hard blocks on restricted topics

Step 3: Make data boring again (standardise it)

If your AI pilot relies on heroic prompt engineering to compensate for inconsistent data, you’re building on sand.

A more durable path:

  • Define canonical entities (customer, account, transaction, merchant)
  • Standardise time windows and location mapping
  • Create a governed feature store for modelling and analytics teams
  • Build a “golden set” of documents for retrieval (current policies, product T&Cs, procedures)

This is where many teams win or lose.

Step 4: Evaluate like a financial institution, not a chatbot hobbyist

A finance-ready evaluation plan includes:

  • Accuracy by category (not one blended score)
  • Hallucination rate on restricted topics
  • Stability across repeated runs
  • Bias and fairness checks (especially for credit-related outputs)
  • Auditability: can you recreate what the model saw and why it answered?

If you can’t explain it to a regulator, it’s not ready for production.

Why 2026 will reward the “handholding” vendors and teams

The source notes that AI labs and application vendors are increasingly embedding experts with customers. That’s not a services upsell; it’s recognition that adoption is mostly workflow engineering and change management.

In financial services, the most valuable “AI capability” is often cross-functional:

  • Product + operations know the edge cases
  • Data teams know the quirks of lineage and quality
  • Risk and compliance set the safe operating boundary
  • Engineering makes it reliable and observable

Firms that build this muscle will move faster each quarter. Firms that treat AI as a plug-in will keep re-running pilots.

Snippet-worthy truth: AI in finance scales when governance, data, and workflow design scale—not when prompts get smarter.

What to do next if you’re responsible for AI adoption in finance

If you’re staring at a 2026 roadmap and feeling pressure to “show ROI,” I’d focus on three moves:

  1. Pick one workflow per business line with a clear KPI (minutes saved, losses reduced, faster resolution time).
  2. Build a human-in-the-loop operating model from day one (who reviews what, and when the AI must stop).
  3. Invest in data standardisation specifically for your chosen workflow, not as a never-ending enterprise program.

This post is part of our AI in Finance and FinTech series, and this theme will keep coming up: the institutions seeing value aren’t waiting for perfect models. They’re building practical systems that assume imperfection—and still deliver safe, measurable outcomes.

If your AI plans for 2026 had to be cut down to two production bets, which workflows would you choose—and what would you measure to prove they worked?

🇦🇺 AI in Finance: Why It’s Not Working (Yet) - Australia | 3L3C