RFP template: Choosing AI technology for enterprise customer service
An excel sheet containing 100+ detailed evaluation questions across seven categories in scoring-ready format you can send directly to vendors.
Learn More
The moment Rob Reardon knew something had genuinely changed wasn't a benchmark hit or a product milestone. It was when callers started saying "have a good day" at the end of a call.
"You don't say that to a phone tree," he says. "Those small, unprompted courtesies were the proof point for me. We'd crossed the line from an automated system people tolerate to an intelligent AI agent people actually relate to."
Rob has spent several years as Ada's Principal Product Manager for Voice, building through one of the most significant technological shifts the industry has seen. He was drawn to the problem because it had no good solution.
"Like most people, I'd been burned by customer service voice automation, endlessly navigating a touch-tone IVR that felt less like help and more like a troll guarding a bridge, demanding a toll before it would finally let me reach a human."
He saw a chance not to incrementally improve that experience, but to reimagine it entirely. What's kept him here is that voice is, in his words, "genuinely hard in the most interesting way."
We spoke to Rob to learn more about how he thinks about agentic voice AI : the frameworks, the distinctions, and the inconvenient truths most of the industry is still missing.
When companies talk about improving voice AI, the conversation usually lands on speech recognition. Rob thinks this is the wrong frame entirely.
"Speech recognition is just one thin slice of a five-star voice experience—necessary, but nowhere near sufficient," he says. "The frame is wrong because it optimizes the easiest-to-measure part while ignoring everything that actually determines whether a call succeeds."
His list of what actually determines it is more demanding. A genuinely great AI voice agent has to:
Ultimately, Rob says the thing that matters most is: can it actually resolve the issue?

There's a word that appears constantly in voice AI right now: agentic. Rob has a precise definition for what it means, and what it doesn't.
"Strip away the jargon and 'agentic' means one thing: the AI agent can actually do the thing, not just talk about it. A non-agentic system listens, maybe answers a question, and then routes you somewhere else to get the real work done. An agentic one understands your situation, reasons about it, and takes action on your behalf—across whatever systems that requires."
The clearest illustration is the scenario airlines dread most: a major disruption . It’s also where voice AI for airlines faces its highest-stakes test. Let’s say a passenger whose flight just got cancelled at midnight calls in:
When disruption hits: How agentic customer experience helps companies respond under pressure The difference is not how the AI agent sounds. It's what it can accomplish. If the traditional IVR is a decision tree , Ada's AI voice agent is a reasoner. The difference shows up most when things go wrong.
To illustrate the technical depth this requires, he walks through a seemingly simple scenario: a caller wants to apply a refund from a return in transit to a new order.
An AI agent needs to look up the return status, reason about policy, locate or create the new order, execute the transaction across potentially two different systems, and confirm back in plain language—all while the caller might interrupt or change their mind midway through. A simple Q&A bot can't do this. It has no memory of step one by the time it reaches step three.

Most thinking about latency is simple: faster is better. Rob's view is more layered.
"In human conversation, we expect a response within two or three seconds. Past that, people assume they weren't heard and start to repeat themselves." Optimizing for latency, he says, isn't about chasing a number on a dashboard. It's about reassuring the caller that the system is alive, listening, and working.
"Milliseconds are really about emotional reassurance."
But not all latency is bad. When you ask a human agent to look something up, you expect them to take a moment, and you're fine with it, because you know work is happening. "The trick isn't eliminating every silence. It's making sure silence is never mistaken for a broken system."
This is where interstitial audio comes in: a subtle, purpose-built cue that turns dead air into "hang tight, I'm on it.” It also prevents callers from hanging up during the seconds required for an AI agent to locate an account or execute a transaction.
Perhaps the sharpest framework Rob has developed is the distinction between three tiers of voice AI outcomes, and how conflating them has eroded trust in the industry.

The details that separate an AI voice agent that works in a demo from one that works at 2 a.m. in a noisy terminal are, Rob admits, unglamorous.
"None of these are dramatic on their own, but they stack; each small friction chips away at the caller's confidence, and a few of them together is the difference between a call that feels effortless and one that feels broken. Getting them right is detailed engineering, and it's precisely what separates an AI voice agent that works in a demo from one that works in a real, messy environment."
The same precision applies to language. Ada's multilingual voice AI supports 42 languages natively, not through translation.
"The difference between native multilingual support and translation is the difference between thinking in a language and guessing at it," Rob says. A translation-based approach runs everything through an intermediary layer, and every hop adds latency and loses something: tone, nuance, idiom. "Building it natively means the AI agent understands and responds in the language directly, so it's faster, more accurate, and it actually sounds like it belongs in that language and culture."
During a disruption, the AI agent can even switch languages mid-call without dropping or restarting the conversation.
And when a call does end with a human, the handoff matters just as much. The AI handles authentication before a human ever picks up, routes to the right agent immediately, and passes full context so the agent knows the story before the caller even says hello.

The next frontier, in Rob's view, is outbound. Not sales calls, but what he describes as "support-first” transactional and informational service calls that are genuinely better when the company initiates them. Think appointment reminders, fraud alerts, delivery notifications, and renewal reminders.
"When the agent can reach out at the right moment, an enormous amount of customer effort and stress simply never materializes."
For airlines , the implication is direct: rather than waiting for passengers to call in during a disruption, the AI agent proactively reaches those affected and helps them sort out next steps before the queue ever builds.
And on the bigger question of where voice AI is heading, Rob pushes back on the conventional wisdom.
"The biggest misconception is the belief that people don't want to talk to companies, that everyone has migrated to digital and messaging, and that voice is a legacy channel on its way out. I think that's flatly wrong. The problem was never that people dislike voice, it's that companies made voice miserable."
Speech, he argues, is the original interface. Low-effort, natural, fast, and personal in a way typing never will be. When the experience is actually good, customers reach for it.
"The companies that get this right won't treat voice as a cost center to contain. They'll invest in enabling more voice conversations, because that's where the most human, highest-trust moments with their customers happen."
This is the shift most people are still missing, he notes. "And the ones who see it early are going to come out well ahead."
This playbook is for leaders who already know IVR has reached its limits, but need a way to move beyond it with confidence, control, and clarity.
See guide