Replay Safety: 3 Tool Classes on Resume | Explainers

Read this as Can this pending tool call be safely run again?

Failure Trap: Treating workflow replay as retry and sending the same side effect twice.
Decision Rule: Classify every tool at registration: pure may rerun, idempotent needs a stable key, unsafe must cache-hit or throw.

1 / ?

A crash lands after a side effect

The dangerous window is after a tool like send_email fires but before the workflow records the result durably.

The checkpoint still lists a pending tool.
The upstream side effect may already exist.
Blind replay can duplicate the action.

Resume replays pending work

On resume, the runtime sees pending tools and needs a harness rule before it can dispatch anything.

The runtime cannot infer side effects.
Tool names are not enough.
The harness owns the replay contract.

Inputs get a stable hash

The cache key includes workflow, node, checkpoint, tool name, and a canonical input hash.

Canonical JSON avoids key-order misses.
The key scopes results to the run.
Hash first, dispatch second.

Cache hit means no side effect

If a result is already recorded for that exact call, resume returns the cached value without calling the tool again.

This is safe for every class.
It also saves token and tool cost.
The cache row is the receipt.

Three classes decide cache miss

Pure tools can rerun. Idempotent tools can rerun only with the same key. Unsafe tools cannot rerun blindly.

pure: read-only or deterministic.
idempotent_with_key: caller supplies stable key.
unsafe_on_replay: side effect cannot be deduped.

Unsafe miss throws for human recovery

For unsafe tools, cache miss means ReplayUnsafeError. The operator checks the upstream receipt and resolves forward.

Do not fabricate success.
Do not fabricate failure.
Do not auto-retry the side effect.