The founder runs four product fronts in parallel and a discovery-call funnel on top of them. He didn’t need another dashboard. He needed one source of truth that could read across every front, decide what was actually worth a notification, and stay quiet otherwise. So we built an operations agent that lives in a container on a VPS, runs on a schedule, and only interrupts when something genuinely changed.
The challenge
Most founders running multiple parallel initiatives reach the same breaking point: the cost of knowing what’s happening across all of them exceeds the cost of doing the work on any one of them. The Voice Agent pipeline has its own dashboards. The proposal system has its own Postgres. The CRM has Notion. WhatsApp has the actual customer conversations. Each tool is reasonable in isolation; collectively they ask the founder to hold the union of all of them in his head every morning.
The problem wasn’t being lost on any single front — it was being lost between them. The information needed to decide what to work on next was scattered across five tools and three databases, and assembling it took the first hour of every working day.
The practical question this project had to answer: could a single agent read across the whole operation — Notion, Postgres, the doc repo — generate one digest per morning and one summary per week, and quietly escalate to WhatsApp only when something actually broke? Not another dashboard. Not another inbox. One agent, two scheduled reports, and a silent runtime in between.
The approach
The first architectural decision was: which side of the trust boundary does this agent live on? An agent that can read every internal database and decide what to surface is a credential-handling system before it is a productivity tool. So the rule was set before code was written — the agent runs in a container, on a VPS the founder owns, with .env mounted from outside the image and git deploy keys scoped per repo. No third-party automation platform, no managed agent runtime, no shared credential surface.
The second decision was the SDK. claude-agent-sdk (Python) gave the right primitives — tool calls, scheduled invocations, persistent telemetry — but ships as a thin Python wrapper around the Claude Code CLI under the hood. That meant the container had to carry Node 20 LTS plus @anthropic-ai/claude-code alongside Python, and the image grew from 154 MB to 1.29 GB in a single commit. The original Phase 0 acceptance criterion of <500MB became unachievable. We kept the SDK and rewrote the criterion, because the alternative — implementing tool dispatch by hand against the raw Anthropic API — would have meant rebuilding what the SDK already shipped, badly.
This work sits Outside tiers — internal build, not a client engagement — but the engineering bar was Scale-shaped. Anything we cut here would be cut at the next client who asked for an internal operations agent.
What we built
A Python service running in a Docker container on a Hetzner VPS, exposing subcommands for both scheduled and ad-hoc use: ask, chat, digest, weekly, feedback, scheduler. The scheduler is the production mode, and it has grown with the operation — now seven APScheduler jobs against America/Edmonton: a daily digest at 07:00, a weekly summary on Sundays at 20:00, a 60-second uptime heartbeat into Uptime Kuma, a Sunday drift audit, two interval health checks (WhatsApp instance connection, stale chat sessions), and a 06:30 marketing-data ingest (more on that below). For ad-hoc use, ask answers one-shot questions and chat runs an interactive consult — originally over SSH from the founder’s phone, now WhatsApp-first through a sibling FastAPI service.
The agent reads the live operation through three sources: a Postgres MCP server for structured data, the Notion MCP server for the case-studies and CRM databases, and a private git repository of canonical docs mounted into the container so the agent can read the knowledge base the same way a human would. Since launch its reach has widened past internal state: a 06:30 job now ingests Google Analytics 4 (by channel) and Google Ads (by campaign) into Postgres on a rolling 7-day window, so “read across every front” now literally includes the ad spend — the morning digest can speak to campaign performance, not just system health. Telemetry — every invocation, mode, duration, token count, success/failure — is appended to a telemetry log on a bind-mounted volume and committed back to that same docs repository from inside the container, signed by a dedicated bot identity using a deploy key scoped only to it.
The alerting layer is the part most teams get wrong, so it got the most careful design. WhatsApp alerts via the self-hosted Evolution API fire only on transitions — an ok→failed run sends a failure alert, a failed→ok run sends a recovery, but a failed→failed run is silent. The state lives in a JSON file written atomically (tempfile + os.replace) so a crash mid-write can’t corrupt the alert state. The whole alert path is wrapped best-effort: if WhatsApp is down, the digest still runs and the failure is logged but never raised. Alerts notify; they never block. A later hardening gave the alert path a second WhatsApp instance as fallback, so the channel that reports failures has its own redundancy — the monitoring can’t go quietly dark on a single dependency.
Several specific platform quirks shaped the implementation and are worth naming. The claude-agent-sdk Python wraps the Claude Code Node CLI via subprocess, so the container needs both runtimes. The CLI’s --dangerously-skip-permissions flag — which the SDK passes when permission_mode="bypassPermissions" — is refused when EUID == 0, so the container runs as a non-root user with UID 1000 to match the bind-mount owner on the host. Sudo on the VPS is not NOPASSWD by design, so the deployment split work cleanly: the agent installs everything that doesn’t need root, the founder runs the few commands that do. None of these were in the README of any of the components involved.
The weekly summary needed one more piece. WhatsApp has a practical per-message limit, so a full weekly summary would be silently truncated. The fix was a _chunk_text helper that splits on \n\n boundaries (hard-splits only when no boundary exists inside 3,500 characters), prepends a [Weekly N/M] header when there’s more than one chunk, and inserts a 1-second delay between sends so the receiving Evolution API doesn’t drop a burst.
The outcome
Live on the Hetzner VPS since 23-May-2026, and a daily driver since. The production container runs Up healthy with restart count 0; the uptime heartbeat answered 200 OK to Uptime Kuma at 09:18 MDT on the day the scheduler first started and has been answering on the minute ever since. The morning digest fires every day, the weekly summary every Sunday. The test suite has grown with the agent — from 42 at first alerting to 293 green today, as ingestion, audits, and the WhatsApp chat path were each added under test.
The agent has been exercised against the live operation in ad-hoc mode while the scheduled jobs warmed up. The most useful proof was a real query asked from the container: “what’s the state of the Axera Flow fronts today?” The agent returned 4,445 tokens of structured answer in 87 seconds, including a finding the human hadn’t caught — three duplicate Voice Call records from the same prospect, sitting in the Voice Calls table waiting for someone to notice. The founder noticed only because the agent surfaced it. That single output justified the build.
The phone path also worked first try. Over SSH, the chat command opened an interactive session and returned a 585-token reply in about 12 seconds — slower than a desktop terminal would be, but well inside the founder’s spec for “a quick consult from the phone.”
The architectural bet — that one agent reading across every front could replace the hour-long morning scan — has had weeks to prove out, and it held: the digest is a daily driver, the manual scan is gone, and the agent’s reach only widened. It now ingests marketing data, answers a WhatsApp consult from the phone, and carries a fact-capture loop — when the founder corrects it in plain language (“the BR instance is off on purpose”), the agent tags the fact and feeds it back into the canonical docs, so the next morning’s digest reasons from a corrected picture instead of repeating a stale one. The credentials still never left the VPS the founder owns, and the alert path still only interrupts on real transitions. The thing that was a hypothesis at launch is now just how the operation runs.
Lessons learned
The most useful decisions on this project were about posture, not features. Running the agent inside a container the founder owns, not on someone else’s automation platform. Treating alerts as a transition signal, not a stream. Letting a 1.29 GB image stand because the alternative was worse code. Wiring the alert path best-effort so a WhatsApp outage couldn’t take down a digest. Each of these would have been easy to compromise under deadline pressure, and each one is doing real work in production.
The second lesson is operational: building an agent that decides what’s worth surfacing is mostly an exercise in writing the silent path well. The loud path — generating a digest, sending a message — is the easy part. The hard part is making sure nothing fires when nothing happened. Transition-only state, atomic state writes, best-effort send wrappers, a 60-second heartbeat that proves the agent is alive without saying anything substantive — these aren’t features the founder will ever see. They’re the reason he won’t be woken up at 3 a.m. by a duplicate notification about a failure that already self-recovered.