AI in Customer Service & Contact Centers•December 19, 2025•By 3L3C

Learn how a 92% accurate AI model improves routing, containment, and CSAT—plus a practical blueprint to build AI for contact centers.

AI in customer serviceContact centersChatbotsVoice assistantsAgent assistCX analytics

Featured image for Build a 92% Accurate AI Model for Customer Service

Build a 92% Accurate AI Model for Customer Service

A 92% accurate customer service AI model isn’t “nice to have.” At retail scale—millions of orders, returns, store queries, delivery issues—it’s the difference between a contact center that’s permanently underwater and one that can keep service levels steady during peak demand.

Lidl’s reported 92% accuracy number is the headline, but the real story is how you get there without breaking trust, compliance, or your agents’ patience. Most teams try to buy a chatbot, point it at a knowledge base, and hope for the best. That approach usually collapses the moment customers ask messy, real-world questions.

This post is part of our AI in Customer Service & Contact Centers series. The focus here: what a “92% accurate model” actually implies, the steps that typically make accuracy achievable in production, and a practical blueprint you can apply whether you’re rolling out chatbots, voice assistants, or agent-assist in a modern contact center.

What “92% accuracy” should mean in a contact center

Accuracy in customer service AI is only meaningful when it’s tied to a specific task and a measurable outcome. If you don’t define the task, accuracy becomes a vanity metric.

In customer service and contact centers, accuracy is commonly measured in a few practical ways:

Intent classification accuracy: Did the model correctly identify the reason for contact (e.g., “refund status,” “change delivery,” “store opening hours”)?
Routing accuracy: Did the interaction land in the right queue or with the right specialist?
Answer accuracy (retrieval quality): Did the model pull the correct policy/article snippet and present it correctly?
Resolution accuracy: Did the model’s action or guidance result in a correct outcome (refund submitted, address updated, appointment scheduled)?

If Lidl achieved 92% accuracy, the most plausible interpretation (and the most useful for teams trying to replicate it) is high-performing intent detection and/or routing for a defined set of high-volume customer requests—often the first place AI pays off.

A better metric stack than “accuracy” alone

If you’re building for leads and lasting impact, measure what leadership cares about. The teams that win internal buy-in translate model performance into contact center outcomes:

Containment rate: % of contacts resolved without an agent
Deflection rate: % of contacts prevented (self-serve success)
AHT impact: Change in average handle time for agent-handled contacts
Transfer rate: How often customers bounce between queues
Reopen rate: Customers coming back for the same issue
CSAT by intent: Not overall CSAT—CSAT per problem type

Here’s the stance I’ll take: A “92% accurate” model that increases reopen rates is not a success. It’s a liability.

How retailers actually reach 92%: the pipeline behind the number

High accuracy usually comes from process discipline, not fancy algorithms. The model matters, but the operating model matters more.

Below is the pipeline I’ve seen work best in retail and high-volume service environments—and it lines up with what you’d expect from an organization like Lidl that runs at scale.

1) Start where volume is high and language is repetitive

Pick 10–30 intents that represent a big slice of your contacts and have clear resolution paths. In retail, those are often:

Order status and tracking
Returns and refunds
Payment and promo code issues
Product availability
Delivery rescheduling
Store hours, store services, receipts

These intents are common, relatively structured, and measurable. They’re also the ones that spike during December peak, post-holiday returns, and weather-related disruptions—exactly the seasonal volatility contact centers deal with in late 2025.

2) Fix labels before you “fix the model”

Most companies get this wrong: they train on messy historical dispositions and expect clean predictions. Contact center data is notorious for:

Agents selecting the “closest” reason code to move on
Different teams using different codes for the same issue
Category drift over time (new policies, new delivery partners)

If you want 92% accuracy, you need a labeled dataset you trust. That usually means:

Merging duplicate/overlapping intent categories
Writing crisp intent definitions (“what counts / what doesn’t”)
Doing a labeling sprint with QA + senior agents
Auditing label consistency (inter-annotator agreement)

3) Design for escalation from day one

The fastest way to lose trust is to trap customers in automation. Good systems make escalation feel like a feature, not a failure.

What that looks like in practice:

Confidence thresholds (e.g., route to agent below 0.75 confidence)
“Confirm intent” prompts only when necessary (don’t interrogate customers)
Immediate agent handoff when sentiment drops or policy exceptions appear
Short summary passed to agent (issue + intent + extracted entities)

In other words: your AI doesn’t need to be perfect. It needs to know when it’s not perfect.

4) Treat knowledge as a product, not a folder

A customer service AI model is only as good as the policies it can cite and the procedures it can execute. Retail policies change constantly (returns windows, delivery SLAs, substitutions, restricted items). That means your knowledge base needs:

Ownership (who updates what, and when)
Versioning (policy effective dates)
Structure (short, scannable articles with clear rules)
Testing (does the bot retrieve the right answer?)

If you’re using retrieval-augmented generation (RAG) for chatbots and voice assistants, the playbook is simple:

Clean documents
Chunk them consistently
Add metadata (country, brand, channel, policy type)
Evaluate retrieval separately from generation

5) Build continuous evaluation into the contact center rhythm

Accuracy isn’t something you achieve once; it’s something you maintain. New promos, new logistics issues, and new customer behavior will degrade performance.

A practical cadence that works:

Weekly: review “low confidence” and “handoff” samples
Biweekly: retrain intent model or update routing rules
Monthly: intent taxonomy review (add/remove intents)
Quarterly: policy and knowledge base audit

The contact center already runs on QA sampling. AI should plug into that discipline, not fight it.

Where AI fits in the contact center: chatbot, voice, and agent-assist

A 92% accurate model is most valuable when it sits inside a broader contact center design. Don’t force one AI component to do everything.

Chatbots: great for structured tasks and status updates

Chatbots perform best when they:

Verify identity quickly
Pull order details from back-end systems
Give short, direct answers
Offer next-step buttons (“Start a return,” “Reschedule delivery”)

If you’re chasing containment, prioritize status and simple actions first. Customers will accept automation when it’s fast and correct.

Voice assistants: stronger when paired with tight routing

Voice is harder than chat because:

People interrupt
Audio quality varies
Customers describe issues in longer narratives

So a smart path is voice triage + routing before voice full-resolution. Use the model to identify intent, capture key entities (order ID, postcode), then route to the right place with context.

Agent-assist: the safest path to ROI in many teams

If you’re cautious about customer-facing automation, agent-assist is often the quickest win:

Real-time suggested replies
Policy snippets and step-by-step checklists
Auto-summaries for after-call work

Here’s what I’ve found: Agent-assist can deliver measurable AHT and quality improvements even when customer-facing bots struggle. It’s also politically easier to roll out because it helps agents instead of “replacing” them.

A practical blueprint to replicate Lidl’s outcome

If your goal is a high-accuracy AI model for customer service, the path is straightforward—but not effortless. Here’s a proven implementation sequence that avoids the most common traps.

Step 1: Choose a narrow scope with clear success criteria

Define:

Channels (chat, email, voice)
Top intents (start with 10–30)
Target metric (e.g., 85%+ intent accuracy in month 1; 92% by month 3)
Business metric (containment + AHT + CSAT by intent)

Step 2: Build the dataset you wish you had

Export 3–6 months of transcripts/tickets
Remove sensitive fields (PII minimization)
Label a representative sample per intent
Include “other/unknown” as a real class (don’t force-fit)

Step 3: Implement guardrails and handoff design

Confidence thresholds
Escalation triggers (sentiment, repeats, “agent please”)
Agent context package (summary + extracted entities)

Step 4: Launch with monitoring, not optimism

Operationalize monitoring:

Daily dashboard: top intents, handoffs, fallbacks
Weekly review: misclassifications and policy gaps
Retraining queue: examples to label next

Step 5: Expand in rings, not in a big-bang rollout

Grow from:

Intent routing
Status and FAQs
Simple transactions (returns initiation)
Complex exceptions (partial refunds, delivery disputes)

This ringed rollout is how you protect CSAT while still pushing automation.

The real lesson behind Lidl’s 92% accuracy

A 92% accurate AI model for customer service is a sign that the team treated AI like an operational capability: data discipline, clear intent taxonomy, strong escalation design, and continuous evaluation. The model is only one piece.

If you’re planning 2026 contact center upgrades, this is the bet I’d make: build for measurable outcomes (AHT, containment, CSAT by intent), and let accuracy serve those goals—not the other way around. That’s how you scale AI in customer service without burning customer trust.

If you want to pressure-test your own roadmap, start by listing your top 20 contact drivers and asking one simple question: Which of these can be resolved with a predictable policy and a predictable workflow? Those are your first candidates for a “92% accuracy” success story.

Build a 92% Accurate AI Model for Customer Service

Build a 92% Accurate AI Model for Customer Service

What “92% accuracy” should mean in a contact center

A better metric stack than “accuracy” alone

How retailers actually reach 92%: the pipeline behind the number

1) Start where volume is high and language is repetitive

2) Fix labels before you “fix the model”

3) Design for escalation from day one

4) Treat knowledge as a product, not a folder

5) Build continuous evaluation into the contact center rhythm

Where AI fits in the contact center: chatbot, voice, and agent-assist

Chatbots: great for structured tasks and status updates

Voice assistants: stronger when paired with tight routing

Agent-assist: the safest path to ROI in many teams

A practical blueprint to replicate Lidl’s outcome

Step 1: Choose a narrow scope with clear success criteria

Step 2: Build the dataset you wish you had

Step 3: Implement guardrails and handoff design

Step 4: Launch with monitoring, not optimism

Step 5: Expand in rings, not in a big-bang rollout

People Also Ask (and the honest answers)

Is 92% accuracy enough to automate customer support?

What’s the fastest way to improve accuracy in a contact center AI model?

Should I start with chatbots or agent-assist?

The real lesson behind Lidl’s 92% accuracy