Learn how a 92% accurate AI model improves routing, containment, and CSAT—plus a practical blueprint to build AI for contact centers.

Build a 92% Accurate AI Model for Customer Service
A 92% accurate customer service AI model isn’t “nice to have.” At retail scale—millions of orders, returns, store queries, delivery issues—it’s the difference between a contact center that’s permanently underwater and one that can keep service levels steady during peak demand.
Lidl’s reported 92% accuracy number is the headline, but the real story is how you get there without breaking trust, compliance, or your agents’ patience. Most teams try to buy a chatbot, point it at a knowledge base, and hope for the best. That approach usually collapses the moment customers ask messy, real-world questions.
This post is part of our AI in Customer Service & Contact Centers series. The focus here: what a “92% accurate model” actually implies, the steps that typically make accuracy achievable in production, and a practical blueprint you can apply whether you’re rolling out chatbots, voice assistants, or agent-assist in a modern contact center.
What “92% accuracy” should mean in a contact center
Accuracy in customer service AI is only meaningful when it’s tied to a specific task and a measurable outcome. If you don’t define the task, accuracy becomes a vanity metric.
In customer service and contact centers, accuracy is commonly measured in a few practical ways:
- Intent classification accuracy: Did the model correctly identify the reason for contact (e.g., “refund status,” “change delivery,” “store opening hours”)?
- Routing accuracy: Did the interaction land in the right queue or with the right specialist?
- Answer accuracy (retrieval quality): Did the model pull the correct policy/article snippet and present it correctly?
- Resolution accuracy: Did the model’s action or guidance result in a correct outcome (refund submitted, address updated, appointment scheduled)?
If Lidl achieved 92% accuracy, the most plausible interpretation (and the most useful for teams trying to replicate it) is high-performing intent detection and/or routing for a defined set of high-volume customer requests—often the first place AI pays off.
A better metric stack than “accuracy” alone
If you’re building for leads and lasting impact, measure what leadership cares about. The teams that win internal buy-in translate model performance into contact center outcomes:
- Containment rate: % of contacts resolved without an agent
- Deflection rate: % of contacts prevented (self-serve success)
- AHT impact: Change in average handle time for agent-handled contacts
- Transfer rate: How often customers bounce between queues
- Reopen rate: Customers coming back for the same issue
- CSAT by intent: Not overall CSAT—CSAT per problem type
Here’s the stance I’ll take: A “92% accurate” model that increases reopen rates is not a success. It’s a liability.
How retailers actually reach 92%: the pipeline behind the number
High accuracy usually comes from process discipline, not fancy algorithms. The model matters, but the operating model matters more.
Below is the pipeline I’ve seen work best in retail and high-volume service environments—and it lines up with what you’d expect from an organization like Lidl that runs at scale.
1) Start where volume is high and language is repetitive
Pick 10–30 intents that represent a big slice of your contacts and have clear resolution paths. In retail, those are often:
- Order status and tracking
- Returns and refunds
- Payment and promo code issues
- Product availability
- Delivery rescheduling
- Store hours, store services, receipts
These intents are common, relatively structured, and measurable. They’re also the ones that spike during December peak, post-holiday returns, and weather-related disruptions—exactly the seasonal volatility contact centers deal with in late 2025.
2) Fix labels before you “fix the model”
Most companies get this wrong: they train on messy historical dispositions and expect clean predictions. Contact center data is notorious for:
- Agents selecting the “closest” reason code to move on
- Different teams using different codes for the same issue
- Category drift over time (new policies, new delivery partners)
If you want 92% accuracy, you need a labeled dataset you trust. That usually means:
- Merging duplicate/overlapping intent categories
- Writing crisp intent definitions (“what counts / what doesn’t”)
- Doing a labeling sprint with QA + senior agents
- Auditing label consistency (inter-annotator agreement)
3) Design for escalation from day one
The fastest way to lose trust is to trap customers in automation. Good systems make escalation feel like a feature, not a failure.
What that looks like in practice:
- Confidence thresholds (e.g., route to agent below
0.75confidence) - “Confirm intent” prompts only when necessary (don’t interrogate customers)
- Immediate agent handoff when sentiment drops or policy exceptions appear
- Short summary passed to agent (issue + intent + extracted entities)
In other words: your AI doesn’t need to be perfect. It needs to know when it’s not perfect.
4) Treat knowledge as a product, not a folder
A customer service AI model is only as good as the policies it can cite and the procedures it can execute. Retail policies change constantly (returns windows, delivery SLAs, substitutions, restricted items). That means your knowledge base needs:
- Ownership (who updates what, and when)
- Versioning (policy effective dates)
- Structure (short, scannable articles with clear rules)
- Testing (does the bot retrieve the right answer?)
If you’re using retrieval-augmented generation (RAG) for chatbots and voice assistants, the playbook is simple:
- Clean documents
- Chunk them consistently
- Add metadata (country, brand, channel, policy type)
- Evaluate retrieval separately from generation
5) Build continuous evaluation into the contact center rhythm
Accuracy isn’t something you achieve once; it’s something you maintain. New promos, new logistics issues, and new customer behavior will degrade performance.
A practical cadence that works:
- Weekly: review “low confidence” and “handoff” samples
- Biweekly: retrain intent model or update routing rules
- Monthly: intent taxonomy review (add/remove intents)
- Quarterly: policy and knowledge base audit
The contact center already runs on QA sampling. AI should plug into that discipline, not fight it.
Where AI fits in the contact center: chatbot, voice, and agent-assist
A 92% accurate model is most valuable when it sits inside a broader contact center design. Don’t force one AI component to do everything.
Chatbots: great for structured tasks and status updates
Chatbots perform best when they:
- Verify identity quickly
- Pull order details from back-end systems
- Give short, direct answers
- Offer next-step buttons (“Start a return,” “Reschedule delivery”)
If you’re chasing containment, prioritize status and simple actions first. Customers will accept automation when it’s fast and correct.
Voice assistants: stronger when paired with tight routing
Voice is harder than chat because:
- People interrupt
- Audio quality varies
- Customers describe issues in longer narratives
So a smart path is voice triage + routing before voice full-resolution. Use the model to identify intent, capture key entities (order ID, postcode), then route to the right place with context.
Agent-assist: the safest path to ROI in many teams
If you’re cautious about customer-facing automation, agent-assist is often the quickest win:
- Real-time suggested replies
- Policy snippets and step-by-step checklists
- Auto-summaries for after-call work
Here’s what I’ve found: Agent-assist can deliver measurable AHT and quality improvements even when customer-facing bots struggle. It’s also politically easier to roll out because it helps agents instead of “replacing” them.
A practical blueprint to replicate Lidl’s outcome
If your goal is a high-accuracy AI model for customer service, the path is straightforward—but not effortless. Here’s a proven implementation sequence that avoids the most common traps.
Step 1: Choose a narrow scope with clear success criteria
Define:
- Channels (chat, email, voice)
- Top intents (start with 10–30)
- Target metric (e.g., 85%+ intent accuracy in month 1; 92% by month 3)
- Business metric (containment + AHT + CSAT by intent)
Step 2: Build the dataset you wish you had
- Export 3–6 months of transcripts/tickets
- Remove sensitive fields (PII minimization)
- Label a representative sample per intent
- Include “other/unknown” as a real class (don’t force-fit)
Step 3: Implement guardrails and handoff design
- Confidence thresholds
- Escalation triggers (sentiment, repeats, “agent please”)
- Agent context package (summary + extracted entities)
Step 4: Launch with monitoring, not optimism
Operationalize monitoring:
- Daily dashboard: top intents, handoffs, fallbacks
- Weekly review: misclassifications and policy gaps
- Retraining queue: examples to label next
Step 5: Expand in rings, not in a big-bang rollout
Grow from:
- Intent routing
- Status and FAQs
- Simple transactions (returns initiation)
- Complex exceptions (partial refunds, delivery disputes)
This ringed rollout is how you protect CSAT while still pushing automation.
People Also Ask (and the honest answers)
Is 92% accuracy enough to automate customer support?
For routing and triage, yes. For full resolution, it depends on risk. High-stakes intents (payments, account security) need stricter thresholds and more agent oversight.
What’s the fastest way to improve accuracy in a contact center AI model?
Fix intent definitions and labels, then add better handoff rules. Model tweaks help, but clean taxonomy + training data usually gives the biggest jump.
Should I start with chatbots or agent-assist?
If your team is risk-averse or your knowledge base is messy, start with agent-assist. If you have clean transactional flows (order status, returns), start with chatbots.
The real lesson behind Lidl’s 92% accuracy
A 92% accurate AI model for customer service is a sign that the team treated AI like an operational capability: data discipline, clear intent taxonomy, strong escalation design, and continuous evaluation. The model is only one piece.
If you’re planning 2026 contact center upgrades, this is the bet I’d make: build for measurable outcomes (AHT, containment, CSAT by intent), and let accuracy serve those goals—not the other way around. That’s how you scale AI in customer service without burning customer trust.
If you want to pressure-test your own roadmap, start by listing your top 20 contact drivers and asking one simple question: Which of these can be resolved with a predictable policy and a predictable workflow? Those are your first candidates for a “92% accuracy” success story.