Claude Code Swarm (Part 2): Leader/Worker State Machines

A practical blueprint for multi-agent swarms: state transitions, idempotency, retries, and shutdown that doesn’t break under real delivery.

February 6, 2026 multi-agentcoordinationstate-machineprotocolcoding-agentsclaude-code

Most multi-agent systems fail for one reason: they treat coordination like chat.

Chat is not reliable under retries. Chat has no replay semantics. Chat is not idempotent.

If you want a swarm you can ship, you need explicit state machines on both sides:

the leader (the only place where decisions become policy)
the worker (an execution engine that never outruns its approvals)

This post lays out the minimum state machines and rules you need to make swarms behave predictably.

Part 2 of 3. Prev: /swarm-mailbox-protocol/. Next: /swarm-mailbox-type-catalog/.

What You’ll Learn

The minimum leader/worker states you need for correctness.
A mailbox-driven event loop that is safe under duplicates and restarts.
How plan approval, permission sync, sandbox approvals, and shutdown fit together.
Invariants you can test.

The One Rule That Prevents Chaos

Treat every coordination message as an event in a state machine, not as text to be “interpreted”.

That means:

every request has an id
every response is correlated
every state transition is monotonic
retries are expected
duplicates are tolerated

If you don’t do this, your swarm will work until it doesn’t, and then you won’t be able to debug it.

The Transport Assumption

Assume you have a “mailbox” transport that can deliver:

out of order
duplicated
late (after the sender believes it is done)

Assume the transport can also be replayed after a crash.

These are not pessimistic assumptions. They are what happens when you build on files, background loops, user interaction, and long-running sessions.

So the protocol has to be correct even when delivery isn’t.

Message Model

Use an envelope plus a typed payload.

Envelope (transport-level, generic):

from
timestamp
text (either plain text, or a JSON string)
read (delivery bookkeeping)

Payload (protocol-level, typed):

type (required)
requestId (required for requests and most responses)
timestamp (required)
body (type-specific)

The key design choice: typed payloads drive side effects; plain text never does.

Worker State Machine

Here is a minimal worker lifecycle that stays correct under concurrency.

States:

booting
waiting_for_mode (optional, if leader can push “mode” or policy)
idle
plan_proposed (waiting for approval)
executing
blocked_on_permission
blocked_on_sandbox
shutting_down
terminated

Transitions (high level):

booting -> idle
idle -> plan_proposed when plan-mode is required and a task starts
plan_proposed -> executing only after plan approval
executing -> blocked_on_permission when a tool needs approval
executing -> blocked_on_sandbox when network access needs approval
executing -> idle when work completes (or when task is cancelled)
any state -> shutting_down on shutdown request
shutting_down -> terminated after acknowledgment and cleanup

Worker invariants (the ones worth testing):

A worker never executes a task action unless it is in executing.
A worker never enters executing from idle if plan-mode is required.
A worker never applies “permission granted” unless the requestId matches an outstanding request.
A worker always acknowledges shutdown, even if mid-execution.

Leader State Machine

The leader is not just a “router”. It is the policy authority.

States (per worker):

starting
ready
awaiting_plan
awaiting_permission_decision
awaiting_sandbox_decision
awaiting_shutdown_ack
stopped

Transitions:

starting -> ready when the leader can deliver messages and receive a heartbeat or first contact
ready -> awaiting_plan when leader assigns a task that requires a plan
awaiting_plan -> ready after approval and the worker is unblocked
ready -> awaiting_permission_decision when a permission request arrives
ready -> awaiting_sandbox_decision when a sandbox request arrives
any -> awaiting_shutdown_ack when leader initiates shutdown
awaiting_shutdown_ack -> stopped after ack and cleanup

Leader invariants:

For a given worker, only the leader can change the worker’s effective policy.
For a given requestId, the leader emits at most one terminal decision (approve or deny).
The leader can safely retry sends; workers must be idempotent.

Idempotency and Dedup: The Non-Optional Part

If your protocol doesn’t specify what happens on duplicates, you don’t have a protocol.

Implement:

a per-worker seenRequestIds set (bounded by time or count)
a per-worker pending map for outstanding requests by requestId
“last-write wins” rules for policy updates, keyed by timestamp

Processing rule:

Parse typed payload.
If it has a requestId and it was already handled, ignore it.
Otherwise, apply the state transition and record it as handled.

This makes crash recovery and “poller restarts” safe.

The Four Flows You Must Get Right

1. Plan approval

Goal: prevent the worker from taking irreversible action before a human (or leader policy) reviews intent.

Rules:

plan approval gates entry into executing
approval response is tied to a requestId
approval may carry policy (example: permission mode) that must be applied before delivery

2. Permission sync

Goal: make tool execution safe under concurrency.

Rules:

permission requests are events, not “interruptions”
the worker blocks tool execution while in blocked_on_permission
the leader’s decision is terminal for the request id

3. Sandbox approvals (network)

Goal: make network access explicit, auditable, and reversible.

Rules:

sandbox decisions are treated like permissions, but scoped to capability (network) not tool name
approvals must be applied before the worker resumes

4. Shutdown

Goal: make shutdown graceful and correct, even while busy.

Rules:

shutdown request is always handled (no “I missed it”)
the worker acknowledges exactly once per request id
the leader considers the worker live until ack is observed

Implementation Skeleton: Mailbox-Driven Event Loop

This is the control-plane loop that makes everything work:

poll mailbox for unread envelopes
parse typed payloads
apply side effects (policy updates, unblock decisions) before delivery
deliver remaining messages as chat output
periodically emit health and progress events

If you implement this loop and the two state machines above, you can ship a swarm that survives real-world failure modes.

Test Checklist (Make It Repro-Grade)

Write tests that assert invariants under adversarial delivery:

duplicate plan approvals
out-of-order permission decision arrives before the request is recorded
poller restarts mid-flight (replay unread or re-parse text)
shutdown request arrives during a tool call
worker crashes after receiving approval but before applying policy

If these pass, the rest is details.

/swarm-mailbox-protocol/
/permission-control-plane/
/tool-registry-and-execution/

What You’ll Learn

The One Rule That Prevents Chaos

The Transport Assumption

Message Model

Worker State Machine

Leader State Machine

Idempotency and Dedup: The Non-Optional Part

The Four Flows You Must Get Right

1. Plan approval

2. Permission sync

3. Sandbox approvals (network)

4. Shutdown

Implementation Skeleton: Mailbox-Driven Event Loop

Test Checklist (Make It Repro-Grade)

Related