Prompt Cache: Stable Prefix vs Volatile Tail

See why prompt-cache stability is a source-level boundary, not a billing toggle.

Read this as Which prompt bytes must stay identical across turns?
Failure Trap
Editing dynamic content before the cache boundary and silently rewriting the whole prefix.
Decision Rule
Split stable prefix from volatile tail, memoize session-stable values, and instrument cache breaks.
Prompt Cache: Stable Prefix vs Volatile Tail See why prompt-cache stability is a source-level boundary, not a billing toggle. Boundary Boundary stable prefix dynamic tail visible split bytes Cache hit Cache hit prefix stable read tier tail varies 0.1x Bad edit Bad edit before boundary bytes drift cache key lost break Recompute Recompute prefix changed old entry miss new write miss Write tier Write tier new prefix write premium warm again 1.25x Trailing cost Trailing cost 50K-70K lost reads alert on delta damage
1 / ?

Split the prompt at a visible boundary

A stable prefix can be cached only if its bytes remain identical. Dynamic content belongs after the boundary.

  • The boundary is a code artifact.
  • Reviewers can see which side changed.
  • Unknown sections start in the volatile tail.

Stable prefix reads are cheap

When the prefix hits cache, the source chapter frames the read tier as about 0.1x base input cost.

  • The large prefix is reused.
  • Only the tail pays normal input cost.
  • Long conversations depend on this stability.

A bad edit before the boundary changes bytes

A timestamp, request id, or per-turn section in the prefix changes the cache key.

  • One byte can break the prefix.
  • Memoize date-like values per session.
  • Use explicit helpers for uncached sections.

The cache break forces recompute

After a break, the system must treat the large prefix as new input again.

  • The bill spike is a lagging signal.
  • Telemetry should catch the event.
  • The code review is the leading signal.

Cache writes cost more than reads

The chapter frames cache write as about 1.25x base input cost, much higher than the read tier.

  • The break loses the read discount.
  • It also pays the write premium.
  • The same bytes become expensive again.

Trailing damage continues until warm

The immediate write premium is only the first cost. Later turns lose the cheap read path until the new entry is warm.

  • The source cites 50K to 70K wasted tokens per break.
  • Long sessions amplify the damage.
  • Instrument cache-break deltas, not just bills.