How we achieve 99.9% success in complex multi-step voice actions through DFA-based state machines and P99 validation.
Reliability in AI is often treated as a probabilistic challenge — vendors speak of "success rates" and "accuracy scores" as if failure is an expected, acceptable outcome. At Vanira, we treat reliability as an engineering constraint. Sub-99% is not a benchmark; it is a bug. Solving the "stochastic problem" requires deliberately moving critical paths out of the LLM and into deterministic, verifiable code.
The stochastic problem is simple to state: LLMs are inherently non-deterministic. Give the same prompt to GPT-4 twice and you may get subtly different outputs. For generating a creative poem, this is desirable. For updating a Salesforce CRM record with a customer's account balance, this is catastrophic.
The Three Failure Modes of AI Actions
In our analysis of production voice AI systems, we identified three distinct failure modes for agentic actions. First: Parameter Hallucination — the LLM extracts a plausible but incorrect value (a wrong phone number, a misheard date). Second: State Confusion — the agent loses track of conversational context during a multi-step flow and calls the wrong tool or calls it in the wrong order. Third: Partial Execution — the action starts but fails mid-pipeline, leaving the system in an inconsistent state.
Each of these failure modes has a different root cause and requires a different engineering solution. Treating them all as "AI errors" and hoping the next model version fixes them is not an engineering strategy — it is a prayer.
P99 Outcome Verification
Every agentic action in Vanira undergoes a post-execution verification phase. Our orchestration layer validates the side-effects of a tool call against expected outcomes before confirming success to the voice agent. This closed-loop system is what delivers P99 reliability in production.
Concretely: when an agent books a calendar slot, we do not trust the 200 OK from the calendar API. We perform a read-back verification — querying the slot directly to confirm the booking exists with the correct parameters. Only then does the orchestrator send the success signal to the voice agent, which then confirms to the caller. This adds approximately 80ms to the action round-trip but eliminates an entire class of "confirmed but failed" bugs.
"Faith is for people; for AI actions, we use verification."
DFA-Based State Machines
For multi-step action flows — the complex ones, like "cancel my subscription and schedule a win-back call" — we model the entire action graph as a Deterministic Finite Automaton (DFA). Each node in the graph is a discrete action step. Each edge is a transition condition. The DFA cannot enter an invalid state. It cannot skip a required step. It cannot execute a compensation action without first completing the primary action.
This sounds theoretical but the practical impact is concrete. A DFA-modeled cancellation flow will always execute the CRM update before the confirmation email, will always trigger the win-back campaign enrollment after the cancellation is confirmed, and will always roll back cleanly if any step fails — leaving the data in a consistent state rather than a partial ghost state that requires manual cleanup.
Stateless Failover
By utilizing stateless lambda runners distributed across 12 global edge regions, Vanira ensures that even if a specific execution node fails mid-action — due to hardware failure, network partition, or memory exhaustion — the action state is externalized and immediately recoverable. The nearest available runner picks up from the last committed checkpoint with zero context loss.
The key architectural principle is that no critical state lives in the runner's memory. Every state transition is committed to a durable log before the next transition begins. This makes the system naturally idempotent — even if a runner executes the same step twice due to a network hiccup, the second execution is a no-op against the committed log.
Measuring What Actually Matters
Most AI vendors report "accuracy" on curated benchmark datasets. We report verified completion rates on production calls. There is a significant difference. Our P99 success rate of 99.94% is measured on live customer calls including all the noisy, ambiguous, mid-sentence-changed-mind interactions that never appear in lab benchmarks. That number is the real moat.
Technical Engineering Specs
Verified successful completion rate of multi-step tool calls on live production traffic.
Globally distributed action runners for geographic fault tolerance and sub-200ms recovery.
Every tool execution is isolated in a secure, ephemeral container with strict resource limits.
Automatic failover and state recovery time during any node disruption.
Experience the Intelligence
Don't just read about the engineering. Test the Vanira Core directly in your browser. Our demo agent handles multi-step tool execution with the exact protocols described above.
Start Engineering Your Voice OS
Vanira is now in open beta. Create your agents, configure your tool-calls, and integrate the SDK in minutes.
