Ask the expert: Why resolution is the real test of voice AI

Sarah Fox

Senior Content Producer

The moment Rob Reardon knew something had genuinely changed wasn't a benchmark hit or a product milestone. It was when callers started saying "have a good day" at the end of a call.

"You don't say that to a phone tree," he says. "Those small, unprompted courtesies were the proof point for me. We'd crossed the line from an automated system people tolerate to an intelligent AI agent people actually relate to."

Rob has spent several years as Ada's Principal Product Manager for Voice, building through one of the most significant technological shifts the industry has seen. He was drawn to the problem because it had no good solution.

"Like most people, I'd been burned by customer service voice automation, endlessly navigating a touch-tone IVR that felt less like help and more like a troll guarding a bridge, demanding a toll before it would finally let me reach a human."

He saw a chance not to incrementally improve that experience, but to reimagine it entirely. What's kept him here is that voice is, in his words, "genuinely hard in the most interesting way."

We spoke to Rob to learn more about how he thinks about agentic voice AI : the frameworks, the distinctions, and the inconvenient truths most of the industry is still missing.

One brain for every channel: The future of agentic customer service starts here

Speech recognition is just the door

When companies talk about improving voice AI, the conversation usually lands on speech recognition. Rob thinks this is the wrong frame entirely.

"Speech recognition is just one thin slice of a five-star voice experience—necessary, but nowhere near sufficient," he says. "The frame is wrong because it optimizes the easiest-to-measure part while ignoring everything that actually determines whether a call succeeds."

His list of what actually determines it is more demanding. A genuinely great AI voice agent has to:

Handle real-world noise: a crowded airport, a café, a car with the windows down.
Read human speech patterns: the hesitations, the trailing off, the quiet "yeah" that means I'm following, keep going.
Know when to speak: distinguishing a real interruption from background chatter.
Deliver information thoughtfully: not dumping it all at once, and capturing hard-to-hear details like email addresses over bad phone audio.

Ultimately, Rob says the thing that matters most is: can it actually resolve the issue?

What "agentic" actually means

There's a word that appears constantly in voice AI right now: agentic. Rob has a precise definition for what it means, and what it doesn't.

"Strip away the jargon and 'agentic' means one thing: the AI agent can actually do the thing, not just talk about it. A non-agentic system listens, maybe answers a question, and then routes you somewhere else to get the real work done. An agentic one understands your situation, reasons about it, and takes action on your behalf—across whatever systems that requires."

The clearest illustration is the scenario airlines dread most: a major disruption . It’s also where voice AI for airlines faces its highest-stakes test. Let’s say a passenger whose flight just got cancelled at midnight calls in:

A traditional system would route them through a menu, tell them to visit a website or wait on hold, and leave them hanging up more stressed than they started.
Agentic voice AI would let them say plainly what happened, for example, "my flight to Chicago got cancelled,” and handle the rest: identifying who they are, seeing the disruption, finding the best re-route based on their preferences, rebooking them, and texting the new boarding pass before they hang up.

When disruption hits: How agentic customer experience helps companies respond under pressure

The difference is not how the AI agent sounds. It's what it can accomplish. If the traditional IVR is a decision tree , Ada's AI voice agent is a reasoner. The difference shows up most when things go wrong.

To illustrate the technical depth this requires, he walks through a seemingly simple scenario: a caller wants to apply a refund from a return in transit to a new order.

An AI agent needs to look up the return status, reason about policy, locate or create the new order, execute the transaction across potentially two different systems, and confirm back in plain language—all while the caller might interrupt or change their mind midway through. A simple Q&A bot can't do this. It has no memory of step one by the time it reaches step three.

Latency isn't just a number

Most thinking about latency is simple: faster is better. Rob's view is more layered.

"In human conversation, we expect a response within two or three seconds. Past that, people assume they weren't heard and start to repeat themselves." Optimizing for latency, he says, isn't about chasing a number on a dashboard. It's about reassuring the caller that the system is alive, listening, and working.

"Milliseconds are really about emotional reassurance."

But not all latency is bad. When you ask a human agent to look something up, you expect them to take a moment, and you're fine with it, because you know work is happening. "The trick isn't eliminating every silence. It's making sure silence is never mistaken for a broken system."

This is where interstitial audio comes in: a subtle, purpose-built cue that turns dead air into "hang tight, I'm on it.” It also prevents callers from hanging up during the seconds required for an AI agent to locate an account or execute a transaction.

Builder update: What we've shipped for AI voice agents and beyond

Containment vs. resolution vs. five-star resolution

Perhaps the sharpest framework Rob has developed is the distinction between three tiers of voice AI outcomes, and how conflating them has eroded trust in the industry.

Containment keeps the call away from a human agent. "It can be good, but it's often just deflection—telling the caller to go visit a web page to rebook. You haven't solved their problem; you've gotten them off the phone. For a long time this was the industry's main objective, and it's exactly why callers learned to distrust voice automation."
Resolution means the caller's problem is actually handled. In a flight disruption, that means getting them booked on a new flight—no other channel, no follow-up call, nothing left hanging. They came with a problem, and they leave without one.
Five-star resolution is a different tier altogether. "Here the conversation feels consultative. In a disruption, that might mean helping the caller find the best re-route for them , factoring in whether they have a time-of-day preference or need to be somewhere by a certain hour—not just the first available seat. It takes the experience from resolved to delightfully resolved."

The craft that doesn't show up in a demo

The details that separate an AI voice agent that works in a demo from one that works at 2 a.m. in a noisy terminal are, Rob admits, unglamorous.

Background noise filtering keeps the AI agent from transcribing airport chatter as if the caller were speaking.
Precise turn detection prevents it from talking over people or leaving silences that feel broken.
Latency savings—every saved fraction of a second—keep the rhythm of the conversation human.

"None of these are dramatic on their own, but they stack; each small friction chips away at the caller's confidence, and a few of them together is the difference between a call that feels effortless and one that feels broken. Getting them right is detailed engineering, and it's precisely what separates an AI voice agent that works in a demo from one that works in a real, messy environment."

The same precision applies to language. Ada's multilingual voice AI supports 42 languages natively, not through translation.

"The difference between native multilingual support and translation is the difference between thinking in a language and guessing at it," Rob says. A translation-based approach runs everything through an intermediary layer, and every hop adds latency and loses something: tone, nuance, idiom. "Building it natively means the AI agent understands and responds in the language directly, so it's faster, more accurate, and it actually sounds like it belongs in that language and culture."

During a disruption, the AI agent can even switch languages mid-call without dropping or restarting the conversation.

And when a call does end with a human, the handoff matters just as much. The AI handles authentication before a human ever picks up, routes to the right agent immediately, and passes full context so the agent knows the story before the caller even says hello.

What's next

The next frontier, in Rob's view, is outbound. Not sales calls, but what he describes as "support-first” transactional and informational service calls that are genuinely better when the company initiates them. Think appointment reminders, fraud alerts, delivery notifications, and renewal reminders.

"When the agent can reach out at the right moment, an enormous amount of customer effort and stress simply never materializes."

For airlines , the implication is direct: rather than waiting for passengers to call in during a disruption, the AI agent proactively reaches those affected and helps them sort out next steps before the queue ever builds.

And on the bigger question of where voice AI is heading, Rob pushes back on the conventional wisdom.

"The biggest misconception is the belief that people don't want to talk to companies, that everyone has migrated to digital and messaging, and that voice is a legacy channel on its way out. I think that's flatly wrong. The problem was never that people dislike voice, it's that companies made voice miserable."

Speech, he argues, is the original interface. Low-effort, natural, fast, and personal in a way typing never will be. When the experience is actually good, customers reach for it.

"The companies that get this right won't treat voice as a cost center to contain. They'll invest in enabling more voice conversations, because that's where the most human, highest-trust moments with their customers happen."

This is the shift most people are still missing, he notes. "And the ones who see it early are going to come out well ahead."

Move from legacy IVR to modern AI voice agents

This playbook is for leaders who already know IVR has reached its limits, but need a way to move beyond it with confidence, control, and clarity.

See guide

RFP template: Choosing AI technology for enterprise customer service