Replay Safety: 3 Tool Classes on Resume

See how a crashed workflow routes pure, idempotent, and unsafe tools differently when it resumes.

Read this as Can this pending tool call be safely run again?
Failure Trap
Treating workflow replay as retry and sending the same side effect twice.
Decision Rule
Classify every tool at registration: pure may rerun, idempotent needs a stable key, unsafe must cache-hit or throw.
Replay Safety: 3 Tool Classes on Resume See how a crashed workflow routes pure, idempotent, and unsafe tools differently when it resumes. Mid-call crash Mid-call crash send_email worker dies pending tool danger Resume Resume load state pending tool need rule checkpoint Input hash Input hash canonical JSON sha256 lookup key cache key Cache hit Cache hit return result no re-call safe path receipt Classify Classify pure idempotent unsafe 3 classes Throw Throw cache miss unsafe tool HITL resolve no replay
1 / ?

A crash lands after a side effect

The dangerous window is after a tool like send_email fires but before the workflow records the result durably.

  • The checkpoint still lists a pending tool.
  • The upstream side effect may already exist.
  • Blind replay can duplicate the action.

Resume replays pending work

On resume, the runtime sees pending tools and needs a harness rule before it can dispatch anything.

  • The runtime cannot infer side effects.
  • Tool names are not enough.
  • The harness owns the replay contract.

Inputs get a stable hash

The cache key includes workflow, node, checkpoint, tool name, and a canonical input hash.

  • Canonical JSON avoids key-order misses.
  • The key scopes results to the run.
  • Hash first, dispatch second.

Cache hit means no side effect

If a result is already recorded for that exact call, resume returns the cached value without calling the tool again.

  • This is safe for every class.
  • It also saves token and tool cost.
  • The cache row is the receipt.

Three classes decide cache miss

Pure tools can rerun. Idempotent tools can rerun only with the same key. Unsafe tools cannot rerun blindly.

  • pure: read-only or deterministic.
  • idempotent_with_key: caller supplies stable key.
  • unsafe_on_replay: side effect cannot be deduped.

Unsafe miss throws for human recovery

For unsafe tools, cache miss means ReplayUnsafeError. The operator checks the upstream receipt and resolves forward.

  • Do not fabricate success.
  • Do not fabricate failure.
  • Do not auto-retry the side effect.