Read this as Which prompt bytes must stay identical across turns?
- Failure Trap
- Editing dynamic content before the cache boundary and silently rewriting the whole prefix.
- Decision Rule
- Split stable prefix from volatile tail, memoize session-stable values, and instrument cache breaks.
Split the prompt at a visible boundary
A stable prefix can be cached only if its bytes remain identical. Dynamic content belongs after the boundary.
- The boundary is a code artifact.
- Reviewers can see which side changed.
- Unknown sections start in the volatile tail.
Stable prefix reads are cheap
When the prefix hits cache, the source chapter frames the read tier as about 0.1x base input cost.
- The large prefix is reused.
- Only the tail pays normal input cost.
- Long conversations depend on this stability.
A bad edit before the boundary changes bytes
A timestamp, request id, or per-turn section in the prefix changes the cache key.
- One byte can break the prefix.
- Memoize date-like values per session.
- Use explicit helpers for uncached sections.
The cache break forces recompute
After a break, the system must treat the large prefix as new input again.
- The bill spike is a lagging signal.
- Telemetry should catch the event.
- The code review is the leading signal.
Cache writes cost more than reads
The chapter frames cache write as about 1.25x base input cost, much higher than the read tier.
- The break loses the read discount.
- It also pays the write premium.
- The same bytes become expensive again.
Trailing damage continues until warm
The immediate write premium is only the first cost. Later turns lose the cheap read path until the new entry is warm.
- The source cites 50K to 70K wasted tokens per break.
- Long sessions amplify the damage.
- Instrument cache-break deltas, not just bills.