How AI assistant voices are chosen—and why it matters for customer service. Learn a practical framework to pick safe, trusted voices for voice AI.

Choosing AI Assistant Voices for Customer Experience
Voice is the fastest way to earn—or lose—trust in customer service. One awkward pause, one too-cheerful tone at the wrong moment, or one voice that sounds “borrowed” from a recognizable person, and customers stop focusing on the answer and start judging the experience.
That’s why the behind-the-scenes story of how ChatGPT’s voices were selected (and why one voice was paused) matters far beyond a single product update. It’s a real-world case study of how AI is powering digital services in the United States: pairing advanced voice AI with human creative work, and building governance around identity, consent, and brand trust.
This post is part of our AI in Customer Service & Contact Centers series, and I’m going to take a stance: most companies treat voice as a UI skin. It’s not. It’s product behavior. If you’re planning an AI voice assistant, IVR modernization, or voice bot for a contact center, the “voice casting” decisions you make will shape customer satisfaction as much as your model’s accuracy.
Why AI voice choices matter more than the script
Answer first: In a voice interface, how the AI speaks changes what customers believe, what they share, and whether they follow through.
In text chat, users can skim. In a call, they’re captive to pacing, tone, confidence, and timing. A voice that feels calm and competent reduces perceived effort. A voice that sounds overly performative can spike frustration, especially when someone’s calling about fraud, billing, or travel disruptions.
Three practical reasons voice selection has become a frontline customer experience decision:
- Trust is auditory. People judge credibility using tone, cadence, and warmth in seconds.
- Accessibility is real ROI. Voice mode isn’t just “nice to have”—it opens digital services to people with low vision, motor challenges, literacy barriers, or situational constraints (driving, cooking, working).
- Misidentification risk is brand risk. If a voice sounds like a celebrity (or like a specific employee), you’ve created an identity problem, not a design choice.
OpenAI’s approach—using professional voice actors, building voice criteria with casting directors, and explicitly stating that AI voices should not deliberately mimic a celebrity’s distinctive voice—maps closely to what regulated industries and high-volume contact centers need to do anyway.
A case study: how ChatGPT’s voices were selected (and what it signals)
Answer first: The selection process treated voice as a long-term product asset, not a one-off recording.
According to OpenAI’s published timeline, the company ran a five-month casting and production process involving professional voice actors, talent agencies, casting directors, and internal product/research review. They received 400+ submissions, narrowed to 14, and ultimately selected five voices (Breeze, Cove, Ember, Juniper, and Sky).
What “good voice” means in a real AI product
OpenAI worked with award-winning casting directors and producers to define criteria. The list is worth reading like a checklist for any customer-facing AI voice assistant:
- Timelessness (avoids sounding trendy or tied to a moment)
- Approachable + trust-inspiring
- Warm, engaging, confident tone
- Natural and easy to listen to
- Diverse backgrounds and multilingual capability
Here’s the part I like: these criteria aren’t about “prettiness.” They’re about durability—voices that hold up across thousands of interactions, across moods, across contexts.
Compensation and the creative community isn’t a footnote
OpenAI also stated that voice actors receive compensation above top-of-market rates for as long as their voices are used.
If you run a contact center, you already know the ethics and PR stakes around labor and automation. Paying talent fairly and setting clear, ongoing compensation terms is not just “the right thing.” It’s also how you avoid future disputes that interrupt your customer experience.
The “Sky” controversy and the lesson for U.S. digital services
Answer first: If customers think your AI voice is imitating a real person, you’ve already lost control of the narrative—even if you didn’t intend it.
OpenAI shared that it believes AI voices should not deliberately mimic a celebrity’s distinctive voice. It also stated that Sky’s voice was not Scarlett Johansson’s and was not intended to resemble hers, and that the voice actor was cast before outreach to Johansson. Still, OpenAI paused using Sky’s voice out of respect for Johansson’s concerns.
From a customer service and contact center standpoint, this is the lesson:
“Sound-alike” is a product risk category. Treat it like security or privacy, not like branding.
What to do instead: build an “audio identity” policy
If your organization is deploying voice AI—whether it’s for appointment scheduling, claims status, or tier-1 tech support—write an internal policy that answers:
- Who can the voice sound like? (Ideally: no one identifiable.)
- What’s the approval process for new voices?
- What data is used to train or tune the voice?
- What’s the escalation plan if a voice is flagged publicly?
Most teams obsess over prompt safety and forget that a voice itself can trigger a safety or legal response.
Designing Voice Mode for contact centers: what actually changes
Answer first: Voice AI isn’t “chat, but spoken.” It’s a different interaction model with different failure modes.
OpenAI described improvements in GPT-4o’s voice interactions: handling interruptions, managing group conversations, filtering background noise, and adapting to tone. Those features map directly to typical contact center chaos—kids in the background, customers talking over the agent, speakerphone echo, and emotionally loaded calls.
Interruption handling is the new hold music
In a phone call, interruptions happen constantly:
- Customers clarify mid-sentence
- The AI mishears a name and gets corrected
- Someone else in the room answers
If your voice bot can’t handle barge-in cleanly, it feels disrespectful. In practice, you want:
- Barge-in enabled with clear recovery (“Got it—let me restart that part.”)
- Short confirmation loops (confirm critical items only: amounts, addresses, appointment times)
- A visible transcript for QA and compliance reviews (even if the customer never sees it)
Tone adaptation is useful—until it isn’t
A voice assistant that can adapt tone can reduce escalations. But it can also sound creepy if it becomes overly “empathetic” without substance.
My rule: match urgency, not emotion.
- If someone’s angry about a charge, don’t “mirror anger.”
- Do speed up resolution: state what you can do, what you need, and the next step.
A practical pattern that works well:
- Acknowledge: “I can help with that charge.”
- Boundaries: “I can review the last 90 days of transactions.”
- Action: “Tell me the date and amount.”
That’s real empathy in customer service: competence.
How to choose the right AI voice for your brand (a workable framework)
Answer first: Choose voices the way you choose support agents: by performance under stress, not by first impression.
If you’re building an AI voice assistant for customer support in the U.S., here’s a framework you can run in weeks—not months—without cutting corners.
1) Define the “moment that matters” calls
List your top 10 call intents and label them by emotional and regulatory weight:
- Low stakes: password reset, store hours, order status
- Medium: cancellations, refunds, delivery exceptions
- High: fraud, healthcare billing, insurance claims, account lockouts
Different intents may warrant different voices, or at least different speaking styles.
2) Test voices with real scripts and real noise
OpenAI auditioned with scripts that included mindfulness, travel planning, and day-to-day conversation. For contact centers, your audition scripts should include:
- A compliance disclosure
- A misunderstanding and correction
- A high-stress scenario (fraud, missed flight, denied claim)
- A handoff to a human agent
Then test with:
- Background cafe noise
- Speakerphone echo
- Regional accents common in your customer base
3) Score voices on measurable CX signals
Don’t let “I like it” dominate. Use a simple scorecard:
- Comprehension: did users understand the next step?
- Perceived wait time: did the pacing feel fast?
- Trust: would users share an account number or not?
- Annoyance: did they interrupt more?
- Escalation rate: how often did they ask for a human?
Even a lightweight pilot can reveal that the most “pleasant” voice is not the most effective voice.
4) Put governance around identity and consent
Adopt rules similar to what OpenAI stated publicly:
- Don’t design voices to imitate celebrities or distinctive public figures
- Don’t use an employee’s voice without explicit, written, revocable consent
- Maintain a takedown process for any voice that triggers concerns
This is the kind of operational maturity buyers expect now—especially in financial services, healthcare, and large retail.
People also ask: practical FAQs about AI voices in customer service
Is an AI voice assistant the same as a voice bot in a contact center?
They’re closely related. An AI voice assistant usually implies more natural conversation and context awareness, while a voice bot might be narrower (IVR replacement for a few intents). The voice design principles are the same.
Can I use one voice for every customer support scenario?
You can, but it’s rarely optimal. High-stress calls benefit from a steadier, more neutral tone and tighter pacing. Sales or onboarding can tolerate more warmth and personality.
What’s the biggest mistake teams make when launching Voice Mode?
They focus on the model and ignore operations: QA, escalation paths, monitoring, and a plan for voice-related complaints. Voice is not a set-and-forget interface.
What this means for AI-powered customer service in 2026
AI voice interfaces are becoming the default layer for digital services in the U.S.—especially on mobile, in cars, and in smart devices. The companies winning here won’t be the ones with the most “human-sounding” voice. They’ll be the ones with voices that are consistent, consented, and built for real customer moments.
OpenAI’s voice selection process highlights a direction the market is moving toward: professional casting, clear selection criteria, ongoing compensation, and a willingness to pause a voice when concerns arise. That’s not overcautious. It’s what responsible scale looks like.
If you’re planning or upgrading an AI voice assistant for your contact center, don’t start by asking “What voice sounds nice?” Start with: “What voice earns trust when the customer is having a bad day?”