Read this as Can this pending tool call be safely run again?
- Failure Trap
- Treating workflow replay as retry and sending the same side effect twice.
- Decision Rule
- Classify every tool at registration: pure may rerun, idempotent needs a stable key, unsafe must cache-hit or throw.
A crash lands after a side effect
The dangerous window is after a tool like send_email fires but before the workflow records the result durably.
- The checkpoint still lists a pending tool.
- The upstream side effect may already exist.
- Blind replay can duplicate the action.
Resume replays pending work
On resume, the runtime sees pending tools and needs a harness rule before it can dispatch anything.
- The runtime cannot infer side effects.
- Tool names are not enough.
- The harness owns the replay contract.
Inputs get a stable hash
The cache key includes workflow, node, checkpoint, tool name, and a canonical input hash.
- Canonical JSON avoids key-order misses.
- The key scopes results to the run.
- Hash first, dispatch second.
Cache hit means no side effect
If a result is already recorded for that exact call, resume returns the cached value without calling the tool again.
- This is safe for every class.
- It also saves token and tool cost.
- The cache row is the receipt.
Three classes decide cache miss
Pure tools can rerun. Idempotent tools can rerun only with the same key. Unsafe tools cannot rerun blindly.
- pure: read-only or deterministic.
- idempotent_with_key: caller supplies stable key.
- unsafe_on_replay: side effect cannot be deduped.
Unsafe miss throws for human recovery
For unsafe tools, cache miss means ReplayUnsafeError. The operator checks the upstream receipt and resolves forward.
- Do not fabricate success.
- Do not fabricate failure.
- Do not auto-retry the side effect.