Claude Code Swarm (Part 1): Mailbox Protocol
A practical, implementable protocol for multi-agent coding: filesystem inboxes, typed JSON payloads, and a leader-side poller that applies side effects before delivering messages.
Multi-agent coding is easy to demo and hard to ship.
The demo version is: spawn N agents, run them in parallel, paste their answers together.
The production version is: coordinate approvals, permissions, shutdown, and “who is allowed to do what” while the main session is busy doing something else.
Claude Code’s teams/swarm feature shows a concrete answer: a mailbox protocol that treats coordination as a first-class control plane. This post is an implementable reconstruction of that protocol, reverse engineered from the @anthropic-ai/claude-code bundle.
Part 1 of 3. Next: /swarm-state-machines/. Full series: /swarm-mailbox-protocol/, /swarm-state-machines/, /swarm-mailbox-type-catalog/.
What You’ll Learn
- A transport that is boring on purpose: per-agent inbox files with append + lock.
- A minimal envelope/payload split that keeps chat separate from coordination.
- The “InboxPoller” pattern: receive, apply side effects, then deliver.
- The four coordination flows that make swarms usable: plans, permissions, sandbox, shutdown.
- The failure modes you should design for (busy sessions, races, duplicates).
The Design Choice That Makes Everything Else Easier
Claude Code uses filesystem inbox files as the transport.
Each agent has an inbox file that is a JSON array. New entries are appended under a lock and later marked as read. No message bus. No database. No hidden daemon. Just files.
That choice buys you:
- Auditability: the raw messages are inspectable without special tooling.
- Debuggability: you can see “what was sent” and “what was consumed”.
- Local reliability: append + lock is a boring, effective concurrency primitive.
It also forces discipline: if your protocol isn’t explicit, it will be painfully obvious.
Transport: Inbox Envelope
The on-disk inbox is a JSON array of “envelope entries”.
Envelope shape (simplified):
{
"from": "team-lead",
"text": "...",
"timestamp": "[iso8601]",
"color": "optional",
"summary": "optional",
"read": false
}
Important: the envelope is intentionally generic. All the interesting structure is inside text.
Payloads: Plain Text vs Typed JSON
text is either:
- plain text, used for normal DMs and broadcasts, or
- a JSON string with a
typefield, used for protocol messages.
That’s the core split:
- plain text drives collaboration
- typed JSON drives coordination
If you want to build multi-agent systems that behave reliably, don’t mix these.
The Leader Primitive: InboxPoller
Multi-agent systems fail when they treat messages as chat.
Claude Code instead treats the inbox as a queue of events. A leader-side loop (the InboxPoller) does three things in order:
- Receive unread entries.
- Apply side effects for typed payloads (permissions, mode, approvals, shutdown cleanup).
- Deliver the remaining messages into the active chat session.
The ordering matters. Side effects must happen before the agent sees the content, otherwise you get weird states like:
- a worker sees “approved” but hasn’t applied the permission mode
- a shutdown is acknowledged but the worker keeps running
- sandbox access is granted but the sandbox still blocks
If you take one idea from this post: treat coordination payloads as a control plane that runs ahead of chat delivery.
Canonical Payload Types (What You Must Implement)
These are the typed JSON payloads that form the core of the swarm protocol.
Permissions:
permission_requestpermission_response
Sandbox network approvals:
sandbox_permission_requestsandbox_permission_response
Plan approval:
plan_approval_requestplan_approval_response
Mode control:
mode_set_request
Shutdown:
shutdown_requestshutdown_approvedshutdown_rejected
Policy propagation (receiver contract observed):
team_permission_update
Status payloads (structured, but not permission-critical):
idle_notificationtask_assignmenttask_completed
Normal DMs and broadcasts are just plain-text envelope entries, optionally with a short summary.
Wire Schema Cheat Sheet
Below are practical wire shapes you can implement. These are JSON payloads embedded as strings in text.
permission_request (worker -> leader)
{
"type": "permission_request",
"requestId": "req_123",
"agentId": "worker-agent-id",
"toolName": "Bash",
"toolUseId": "toolu_abc",
"description": "Why the tool is needed",
"input": { "any": "json" },
"permissionSuggestions": []
}
permission_response (leader -> worker)
Success:
{
"type": "permission_response",
"requestId": "req_123",
"subtype": "success",
"response": {
"updatedInput": { "any": "json" },
"permissionUpdates": [{ "any": "json" }]
}
}
Error:
{
"type": "permission_response",
"requestId": "req_123",
"subtype": "error",
"error": "Permission denied"
}
sandbox_permission_request (worker -> leader)
{
"type": "sandbox_permission_request",
"requestId": "sb_123",
"workerId": "worker-agent-id",
"workerName": "worker-1",
"workerColor": "cyan",
"hostPattern": { "host": "example.com" },
"createdAt": 0
}
sandbox_permission_response (leader -> worker)
{
"type": "sandbox_permission_response",
"requestId": "sb_123",
"host": "example.com",
"allow": true,
"timestamp": "[iso8601]"
}
plan_approval_request (worker -> leader)
{
"type": "plan_approval_request",
"from": "worker-1",
"timestamp": "[iso8601]",
"planFilePath": "/abs/path/to/PLAN.md",
"planContent": "markdown plan contents",
"requestId": "plan_123"
}
plan_approval_response (leader -> worker)
Approved:
{
"type": "plan_approval_response",
"requestId": "plan_123",
"approved": true,
"timestamp": "[iso8601]",
"permissionMode": "default"
}
Rejected:
{
"type": "plan_approval_response",
"requestId": "plan_123",
"approved": false,
"feedback": "Need to handle the empty input case",
"timestamp": "[iso8601]"
}
shutdown_request / shutdown_approved / shutdown_rejected
{
"type": "shutdown_request",
"requestId": "shutdown_456",
"from": "team-lead",
"reason": "Task completed",
"timestamp": "[iso8601]"
}
{
"type": "shutdown_approved",
"requestId": "shutdown_456",
"from": "worker-1",
"timestamp": "[iso8601]",
"paneId": "optional",
"backendType": "optional"
}
{
"type": "shutdown_rejected",
"requestId": "shutdown_456",
"from": "worker-1",
"reason": "Still working",
"timestamp": "[iso8601]"
}
mode_set_request (leader -> worker)
{
"type": "mode_set_request",
"mode": "acceptEdits",
"from": "team-lead"
}
team_permission_update (leader -> worker, receiver contract)
In the observed receiver behavior, the consumer requires permissionUpdate.rules and permissionUpdate.behavior and applies them as session-scoped allow rules.
{
"type": "team_permission_update",
"toolName": "Bash",
"directoryPath": "/abs/path",
"permissionUpdate": {
"behavior": "allow",
"rules": [{ "any": "json" }]
}
}
If you design your own protocol, make this message explicit and versioned. This is distributed policy.
Status payloads
These payloads exist for UX and workflow visibility.
idle_notification:
{
"type": "idle_notification",
"from": "worker-1",
"timestamp": "[iso8601]",
"idleReason": "available",
"summary": "optional",
"completedTaskId": "optional",
"completedStatus": "optional",
"failureReason": "optional"
}
task_assignment:
{
"type": "task_assignment",
"taskId": "task_123",
"subject": "Fix the failing build",
"description": "Details...",
"assignedBy": "team-lead",
"timestamp": "[iso8601]"
}
task_completed:
{
"type": "task_completed",
"from": "worker-1",
"taskId": "task_123",
"taskSubject": "Fix the failing build",
"timestamp": "[iso8601]"
}
Four Flows You Must Get Right
This protocol is bigger than “send message”.
It is four coordination systems stapled together: plans, permissions, sandboxing, shutdown.
1) Plan approval (worker -> leader -> worker)
A worker in “plan required” mode can’t just proceed. It emits a plan approval request, then blocks until the leader approves or rejects.
This is the simplest form of human-in-the-loop that still scales to multiple workers: the leader is the control point.
2) Tool permission sync (worker -> leader -> worker)
If your workers can run tools, you need a distributed permission decision mechanism.
The key design detail: responses must be resolved by requestId, not by timing or order. Otherwise, parallel tool uses will race and deadlock.
3) Sandbox network approvals (worker -> leader -> worker)
This is “permissions, but for network”.
In practice it behaves like a queue: workers ask for a host pattern, leader approves, worker continues. Treat it as a first-class protocol. It’s not just a UI prompt.
4) Shutdown (leader -> worker -> leader cleanup)
The clean shutdown loop is what makes multi-agent systems survivable:
- the leader requests shutdown
- the worker must explicitly approve or reject
- approvals trigger cleanup (and may kill the worker’s terminal pane)
This prevents “orphan agents” and makes termination observable and auditable.
The Pitfalls (And the Pattern That Prevents Them)
Pitfall: Delivering messages while the session is busy
If you inject teammate messages mid-turn, you corrupt the active agent loop.
Claude Code’s approach is simple and correct:
- if idle, deliver immediately
- if busy, queue and flush later
If you build your own system, treat “busy vs idle delivery” as core runtime logic, not a UI detail.
Pitfall: Multiple consumers for the same response
In real systems, messages can be observed from more than one place (poll loops, global receivers, or UI-driven listeners).
The only safe pattern is:
- centralize resolution by request id
- make it idempotent
- tolerate races
Minimal Implementation Checklist
If you want to implement a Claude Code style mailbox protocol, start here:
- Define an envelope format and treat
textas the payload carrier. - Define typed JSON payloads with stable
typestrings. - Build a leader-side poller that:
- reads unread entries
- applies side effects for typed messages
- delivers the remaining messages to the active session
- Implement request-id based correlation and idempotent response handling.
- Implement explicit shutdown and cleanup.
- Add structured telemetry around every state transition.
This is the difference between “parallel” and “coordinated”.
Related
/swarm-state-machines/— leader/worker state transitions and invariants for correctness under retries./permission-control-plane/— the same idea applied to tool use and sandbox access./streaming-control-plane/— when the transport isn’t files, you still want typed control messages./telemetry-first-architecture/— how to make these flows observable without guessing.