// SERVICE · CUSTOM INTERNAL AGENTS

Multi-stepagentsthatfinishthejob.

WHO THIS IS FOR

For teams that need agents to do real, multi-step work (research, support, ops) and want them production-grade, not autocomplete-with-extra-steps.

Production agents with the eval, observability, and safety scaffolding to actually trust them in front of customers. Tool-calling, memory, recovery from failure, and a human-in-the-loop path for everything that matters.

// USE CASES

What we actually build.

01
PROBLEM
Research agent
Strategy team spends 4 hours researching every prospect. Output is inconsistent.
SOLUTION
Agent with web search, internal CRM access, and a structured brief template. Reviewed before delivery.
STACK
ClaudeBrave SearchPython
TIMELINE
4–5 weeks
02
PROBLEM
Support agent (front-line)
L1 support handles 60% repetitive questions. Tier-2 backlog growing.
SOLUTION
Agent with knowledge-base RAG + ticket-tool calls + clean escalation path. Always cites sources.
STACK
ClaudeZendeskpgvector
TIMELINE
5–7 weeks
03
PROBLEM
Voice agent
Inbound calls for appointment booking, after-hours triage, or status updates flood your team.
SOLUTION
Voice agent with tool calls, transfer-to-human, transcript logging, and refusal on out-of-scope asks.
STACK
VapiOpenAI RealtimeTwilio
TIMELINE
4–6 weeks
04
PROBLEM
Knowledge-base agent
8 years of Notion + Slack + Drive content. Nobody can find anything.
SOLUTION
Agent with permissioned retrieval across sources, citations, and an evals harness on real questions.
STACK
ClaudepgvectorSlack
TIMELINE
5–6 weeks
05
PROBLEM
Sales / ops copilot
Reps need an internal 'ask anything' copilot: pipeline status, account history, next steps.
SOLUTION
Agent with read-only access to Salesforce + Gong + emails + full audit log. Read-only by design.
STACK
OpenAISalesforceGong
TIMELINE
6–8 weeks
// SAMPLE DELIVERABLE

What you walk away with.

agents-handoff/ — 14 files · 2.4 MBmain
📁 agents-handoff/
📄 README.md
📄 architecture.md
📁 evals/
groundtruth.jsonl
dashboard.html
regression.yml
📁 src/
agent.py
tools.py
prompts.yaml
📁 deploy/
terraform/
monitoring.json
📄 runbook.md
🎬 walkthrough.loom
# agents-handoff
## Stack: Claude · OpenAI · pgvector · Python

## What's in here

Complete handoff for the custom agent build. Includes tool
allowlists, audit log schema, and kill-switch config.

## Agents included
- research-agent — web + CRM brief generation
- support-agent — RAG + escalation + citations
- kb-agent — permissioned cross-source retrieval

## Running evals
$ python evals/regression.py --suite groundtruth.jsonl

## Safety defaults
Read-only by default. Write actions require explicit confirmation.
PII filtering active. Rate limits set in monitoring.json.

## Loom walkthrough
24-minute recorded handoff at walkthrough.loom
// FAQ

Common questions for this kind of work.

01
What guardrails do you put on agents?
Read-only by default. Tool allowlists, structured output validation, output filtering for PII, hard rate limits, full audit log, and a kill switch. Write actions always require explicit user confirmation unless you opt out.
02
How do you measure if an agent is 'working'?
Three layers: (1) per-step evals (does the tool call have the right args?), (2) end-to-end task evals (did it finish the goal?), (3) production traces with sampled human review. We ship the dashboard.
03
Multi-agent or single-agent?
Default single-agent with tools. Multi-agent only when there's a real boundary in capability or context window. 'Agent of agents' is usually a red flag. Extra coordination cost without quality gain.