AI Data Ingestion for Underwriting: Lessons from IAG

AI in Finance and FinTech••By 3L3C

AI data ingestion is now core underwriting infrastructure. Here’s what IAG’s 98% accuracy push teaches finance teams about scaling AI safely.

AI in insurancedata ingestionintelligent document processingunderwritingautomationfintech operations
Share:

Featured image for AI Data Ingestion for Underwriting: Lessons from IAG

AI Data Ingestion for Underwriting: Lessons from IAG

A 98% extraction accuracy target sounds like a technical detail—until you translate it into human hours. If a property underwriter is burning up to half a day re-keying partner documents into seven different systems, that’s not “ops overhead.” That’s lost growth, slower quotes, and talented people doing work nobody hired them to do.

That’s why I paid attention to IAG’s push to rework high-volume data ingestion for its intermediated insurance brands (CGU and WFI). The headline isn’t “insurer uses AI.” The story is more practical: IAG treated document ingestion as core underwriting infrastructure, then used AI + automation to remove the worst of the manual work—without pretending accuracy doesn’t matter.

For readers following our AI in Finance and FinTech series, this is the same pattern we’ve seen in banking AI for credit scoring and fraud detection: the biggest wins don’t start with fancy models. They start with clean, timely, structured data that decision systems can trust.

Why data ingestion is the bottleneck in risk decisions

The simplest way to say it: you can’t price risk you can’t read. And in commercial property, “reading” often means extracting structured fields from semi-structured inputs.

In IAG’s case, property underwriters were receiving three key partner documents:

  • Asset schedules (from a couple of properties to thousands of locations)
  • Risk schedules (what the customer wants transferred)
  • Prior history (past claims/coverage signals)

When those arrive as PDFs, spreadsheets, scans, or mixed formats, underwriting becomes a data entry exercise before it becomes a risk assessment exercise.

The finance parallel: underwriting and credit are the same data problem

Banks face a similar ingestion mess in loan origination:

  • Payslips, bank statements, business activity statements
  • Identity documents
  • Broker-submitted application packs

If your ingestion is slow or error-prone, your AI credit scoring model is either starved of data or fed garbage. Same for insurance risk models. AI in finance only performs as well as the intake pipeline feeding it.

What IAG did differently (and why it worked)

IAG’s approach is notable because it wasn’t framed as “let’s add AI.” It was framed as commercial enablement: reduce admin time, reduce manual controls, and free underwriters to underwrite.

They also set an appropriately hard metric: circa 98% accuracy for automated extraction and ingestion into underwriting systems.

That number matters. In property underwriting, a single misread value—construction type, flood exposure indicator, sum insured, occupancy, location count—can distort the quote or drive downstream rework.

OCR was the start—LLMs were the acceleration

Many organisations start with OCR. OCR is useful, but it struggles with:

  • Inconsistent templates from brokers/partners
  • Tables that span pages
  • Footnotes, exceptions, and “special conditions” language
  • Mixed units, currency formats, and abbreviations

IAG initially used OCR-driven approaches, then piloted AI and large language models with Appian. Early results landed at 68% accuracy—not production-ready. But in a couple of months they reached ~96–98% confidence in the proof-of-concept phase.

Here’s the insight: LLMs aren’t magic; they’re adaptable. When you pair them with workflow, validation rules, and feedback loops, they can cope with variation that brittle template-based extraction can’t.

A useful stance for any financial services team: treat AI as part of the process design, not an “assistant” bolted onto a broken process.

The hidden architecture behind “AI-powered decisions”

When people talk about AI in finance, they often picture the model: fraud classifier, credit scorecard, claims triage engine. But the model is the last mile. The real work is upstream.

A production-grade ingestion pipeline usually includes:

1) Document intake and classification

Answer first: You need to reliably identify what came in before you extract anything.

  • Detect document type (asset schedule vs risk schedule vs history)
  • Route based on channel (broker portal, email, upload)
  • Capture metadata (partner ID, customer ID, submission date)

2) Extraction with constraints (not free-form text)

Answer first: Extraction must output structured fields aligned to your underwriting/claims data model.

Good implementations don’t ask AI to “summarise the PDF.” They ask it to produce specific fields with checks, such as:

  • Location address, occupancy type, construction type
  • Sum insured, deductibles, limits
  • Count of locations and asset categories

3) Validation and exception handling

Answer first: The goal isn’t 100% automation; it’s predictable handling of the 2–4% that fails.

This is where teams win or lose the business case.

Effective patterns include:

  • Field-level confidence thresholds (only auto-post above X)
  • Cross-field rules (e.g., totals must equal sum of line items)
  • Human review queue that shows only the uncertain fields

4) Integration into core systems

Answer first: Value arrives only when extracted data lands in the system underwriters actually use.

IAG’s underwriters previously touched seven systems. That’s common across financial services when platforms have grown by acquisition and patchwork automation.

Integration is where you stop doing demos and start doing outcomes:

  • Posting structured values into underwriting tools
  • Versioning schedules so changes are trackable
  • Logging audit trails (who/what changed a value)

Where most financial organisations get this wrong

Most teams underestimate two things: variance and governance.

Variance: partner documents are a moving target

Brokers and partners don’t coordinate their templates. One partner updates their schedule format and your extraction breaks.

If you’re building AI ingestion for insurance or banking, design for:

  • Multiple templates per document type
  • New columns appearing without notice
  • One-off “special cases” that are actually frequent

A practical approach is to treat extraction as a product:

  • Maintain a test suite of real documents (anonymised)
  • Track accuracy by partner and by document type
  • Ship improvements on a cadence (weekly/fortnightly)

Governance: automation increases the blast radius of mistakes

Manual entry is slow, but its failures are local. Automation is fast, so its failures can scale.

For regulated finance and insurance teams, governance isn’t optional. You need:

  • Audit logs for extracted values and overrides
  • Clear accountability for model updates and rule changes
  • Access controls and segregation of duties
  • Data retention policies for source documents and derived fields

The punchline: AI makes governance more important, not less.

Practical playbook: building AI ingestion that underwriters trust

If you’re considering an AI ingestion initiative (insurance, banking, or fintech), here’s what works in practice.

Step 1: Pick the “painful and frequent” workflow

Target the process with:

  • High volume
  • High manual effort
  • Clear value if reduced (quote turnaround time, cost-to-serve)

IAG’s choice—complex property schedules—was ambitious. That’s risky, but it also means the capability can later stretch across acquisition and claims, which they’ve already flagged.

Step 2: Define accuracy like the business experiences it

“96% accuracy” is meaningless unless you define:

  • Which fields matter most (critical vs nice-to-have)
  • Whether accuracy is per-field, per-document, or per-submission
  • How you measure and sample for QA

A useful underwriting definition:

  • Critical fields: must be correct for auto-post
  • Review fields: can be suggested but require human confirmation
  • Reference fields: stored for context, not pricing

Step 3: Design for the 2% edge cases on day one

If you want adoption, underwriters must see that the system behaves sensibly when it’s unsure.

Build:

  • A clean exception queue
  • A “why this is uncertain” indicator
  • Fast correction workflows that feed learning

Step 4: Connect ingestion to downstream decisioning

The best metric isn’t “documents processed.” It’s business throughput:

  • Quote turnaround time
  • Bind rate (quotes that convert)
  • Referral rate (how often automation still triggers manual review)
  • Underwriter capacity released (hours/week)

This is where AI in finance becomes real: better data intake leads to better risk selection, better pricing discipline, and better customer experience.

What this means for AI in Finance and FinTech in 2026

The timing matters. December is when many risk and operations leaders are locking roadmaps and budgets for the new year. If AI is on your 2026 plan, document ingestion is one of the safest places to start because it ties directly to speed, accuracy, and cost.

IAG’s experience also points to a broader truth: the winners won’t be the firms with the flashiest demos. They’ll be the ones that industrialise the unglamorous parts—data capture, validation, and integration—so every downstream AI system has something reliable to work with.

If you’re building AI for underwriting, claims, credit decisioning, or fraud detection, ask your team one simple question: How many systems does a person touch before a decision can even begin? Wherever that number is high, ingestion automation is usually the highest-ROI fix.

If you want to pressure-test your ingestion use case—fields, accuracy definitions, exception design, and governance—what’s the one workflow where shaving 30 minutes per case would immediately show up in revenue or customer experience?