The voice agent that sells voice agents

A bilingual voice agent on a real phone number runs the discovery call itself, files a structured briefing in Notion before the call ends, and is bounded by a database-enforced spend cap.

Timeline: Since May 2026
Duration: 2 weeks
Sector: Technology

EN · PT

bilingual, on a live phone line

Axera Flow · Technology

Before

The problem

Selling AI voice agents is a credibility problem before it's a sales problem. Prospects who've never used one don't trust demo videos. Meanwhile the warming Canadian pipeline meant unpaid 30–45 minute discovery calls — repeated questions about industry, stack, and goals — were starting to dominate the calendar.

After

The solution

A bilingual voice agent reachable at a real phone number that runs the discovery call itself, then files a structured briefing in Notion (plus a WhatsApp summary) before the call has fully ended. Three defensive layers — rate limiting, a database-enforced CAD 20/day spend cap, and a heartbeat monitor — keep a misconfigured telephony vendor from quietly burning a five-figure phone bill.

Tech stack

Retell AI Claude API n8n Postgres Notion Cal.com Hetzner Docker

Results

Measured impact

Daily hard cap (DB-enforced)

CAD 20

Concurrent calls

Best model in test

Haiku 4.5

Call a real number. A bilingual voice agent picks up, runs a real discovery conversation, files a structured briefing in Notion before the call ends, and lets a Calgary consultancy show prospects exactly what it builds — by being the thing it builds.

The challenge

Selling AI voice agents is a credibility problem before it is a sales problem. Prospects who have never used one don’t know what to ask, don’t trust the demo videos, and have a thousand opinions shaped by the first generation of robocallers that wasted everyone’s time. A polished landing page can describe the product in six different ways and still lose the meeting, because the only convincing answer to “is it actually good?” is to put the prospect on the phone with one.

Axera Flow had two adjacent problems converging. The Canadian pipeline was warming up and discovery calls — the unpaid, qualifying conversations before a real engagement — were starting to dominate the calendar. Each call was 30–45 minutes of repeated questions about industry, current stack, what they had tried, what they wanted to automate. Worthwhile when the lead converted; expensive when it didn’t.

The practical question the project had to answer: could a single voice agent serve as both the qualifier and the work sample — running the discovery call itself, in two languages, while a prospect was deciding whether Axera Flow was credible enough to hire?

The approach

The first decision was which voice platform to bet on. Retell, Vapi, Bland, and ElevenLabs Voice Agents all had pitch decks pointing at the same problem. After hands-on testing, Retell won on three counts: pay-as-you-go pricing with 20 concurrent calls included before any upgrade was required, lower observed latency on bilingual conversations, and a Claude-native model option that fit the rest of the Axera Flow stack. Vapi lagged on latency in side-by-side tests, Bland’s voice quality felt synthetic in a Calgary accent, and ElevenLabs’ agent product was still too young for production billing.

The harder decision was posture. A vendor that handles voice, telephony, and billing-per-minute is also a vendor that, if misconfigured or attacked, could quietly burn a five-figure phone bill before anyone noticed. So the architecture rule was set before the first call connected: treat Retell as a dependency whose blast radius we control ourselves, not as a vendor whose promises we trust.

This work sits Outside tiers — internal build, not a client engagement — but the engineering bar was Scale-shaped. The same stack that runs this agent is what gets installed for clients who buy the Voice Agent Stack, so corner-cutting on the demo would have meant corner-cutting on the product.

What we built

A voice agent reachable on a dedicated Canadian phone number, running on Retell with Claude Haiku 4.5 driving the conversation. When a call ends, Retell fires a webhook into a self-hosted n8n workflow that validates the request, persists the transcript and cost into a Postgres calls table, generates a structured discovery briefing through the Claude API, and delivers it as a Notion page plus a WhatsApp summary to the founder. Cal.com handles any scheduling the agent agrees to.

Three defensive layers sit between Retell and the bank account. The webhook is rate-limited at the n8n side. A Postgres trigger maintains a rolling 24-hour spend counter and refuses to insert a new call row past CAD 20 of cost — the database itself is the hard cap, not a vendor promise. A workflow monitor with a heartbeat catches the case where Retell stops sending events at all. Whichever layer fails first, the next one catches it.

Several specific platform quirks shaped the implementation and are worth naming, because every one of them surfaced as a production bug before it surfaced as a doc page. Retell’s webhook signature scheme isn’t Stripe’s; the HMAC payload is the timestamp concatenated with the raw body, no separator, and the canonical client examples don’t make that obvious. The combined_cost field arrives in cents, not dollars — useful to know before a dashboard reports calls as costing a hundred times what they actually do. Retell’s webhook timeout is five seconds, which is shorter than a real briefing-generation pipeline takes; responses go out immediately and heavy work happens in the background. And saving an agent config without explicitly publishing it silently spawns a draft v+1 while production calls keep hitting the previous version — a configuration footgun that ate a debugging session.

The landing page is part of the architecture, not a separate concern. A visitor arriving with ?utm_campaign=voice-agent sees the hero swap to “Talk to our AI now — leave your number, our Voice Agent calls back in ~30 seconds. Bilingual EN/PT,” a click-to-dial link, and a backend that knows where the call came from. Everyone else sees the standard form. The attribution stack persists gclid and five UTM parameters for 90 days in localStorage, validated across 30 isolation tests, so the briefing in Notion includes the ad that produced the call, not just the call itself.

What we deliberately did not do: trim the system prompt for latency. It was the tempting lever and we left it alone. The asymmetric risk — a faster agent that asks worse questions in a real discovery call — wasn’t worth the milliseconds. Model swap and knowledge-base optimization dominated the latency budget anyway. Not every optimization is one you should take.

The outcome

The agent went live on May 23, 2026. Real discovery calls have run through it; the founder has been on the receiving end as the lead, and the briefing arrived in Notion before the call had fully ended. The CAD 20/day cap has never tripped in production, which is the boring outcome you want from a hard cap — it sits there doing nothing until the day it does everything.

The most useful empirical result came from a model swap that contradicted the assumption it was supposed to test. Moving the agent from Claude Sonnet 4.6 to Claude Haiku 4.5 was framed as a latency play, with an expected quality trade-off. The actual outcome of a real ~2–3 minute discovery call was the opposite: Haiku gave both lower latency and asked deeper follow-up questions than Sonnet had. The cheaper, faster model produced a better conversation. That finding is now load-bearing in how Axera Flow sizes models for voice-mode work — and a useful counterweight to the default instinct of always reaching for the larger model.

No conversion numbers yet — the call volume is too small to claim a meaningful lift on close rate, and an invented metric would be worse than no metric. What can be said honestly: the agent is in production, the financial blast radius is bounded by code rather than by trust, and the inflection-point assumption about model size has been tested against a real prospect, not a synthetic eval.

Lessons learned

The most useful decisions on this project were about what not to build or change. Not trimming the system prompt. Not trusting a vendor’s promise about runaway billing. Not assuming the bigger model would give the deeper answer. Each refusal was load-bearing, and each one would have been easy to skip under deadline pressure.

The second lesson is one this project shares with every other build that touches a black-box vendor: the platform’s documentation describes the happy path, not the production path. The HMAC quirk, the cents-vs-dollars unit on cost, the silent-draft footgun on publish, the five-second webhook timeout — none of these were in the README. They were learned at the cost of a production incident each. Budget that cost into the next vendor integration, or it will get billed to a debugging session at a worse moment.

All cases

Presentation mode · PDF version