I/D/E · flue-framework

Compaction As Failure Recovery

Summary

A source-level reading of Flue compaction as threshold measurement, cut-point selection, summary entry creation, usage accounting, and overflow retry.

Compaction is easy to under-teach. If you describe it as “summarize old messages when context gets long,” you miss the architecture.

In Flue, compaction is a runtime state transition. It measures context, chooses a valid cut point, serializes older conversation, asks the model for a structured checkpoint, appends a compaction entry to the session tree, records usage, and sometimes retries after a provider context overflow.

The source pin for this chapter is withastro/flue@dbaa9effa305561c627c6836559f8a0cbce67875.

Cut Point Workbench

Compaction is only safe when the chosen cut preserves a runnable suffix.

Domain Word

In Flue, compaction is a state transition that replaces old active-path detail with a summary entry while preserving enough recent context for the session to continue.

The invariant is: context reduction must keep the conversation runnable and must not erase the accounting and recovery facts the harness needs.

The twelve-factor pressure is disposability. Long-running headless agents need recovery behavior when context pressure or provider overflow would otherwise kill the run.

The State Machine

COMPACTION PATHS
normal prompt turn


assistant message arrives

 stopReason is context overflow
      remove failed assistant leaf
        -> runCompaction("overflow", willRetry=true)
        -> append compaction entry
        -> rebuild context
        -> continue rented Agent

 usage near context limit
       runCompaction("threshold", willRetry=false)
         -> append compaction entry
         -> rebuild context

The threshold path is proactive. The overflow path is recovery. The second path is the one that makes compaction a reliability feature, not just a cost feature.

Settings And Measurement

packages/runtime/src/compaction.ts defines:

export const DEFAULT_COMPACTION_SETTINGS = {
  enabled: true,
  reserveTokens: 16384,
  keepRecentTokens: 20000,
};

The threshold trigger is intentionally tied to provider-reported usage from the latest assistant message. Session.checkCompaction(...) calls calculateContextTokens(assistantMessage.usage) and passes that value to shouldCompact(...), which checks whether it exceeds contextWindow - reserveTokens.

If the model has no known contextWindow, threshold compaction is skipped. Overflow recovery can still run because it is driven by the provider’s error signal.

Token measurement has two jobs:

JobSource
Threshold triggerProvider usage on the latest assistant message.
Cut planning and tokensBeforeestimateContextTokens(...), which combines provider usage with conservative character estimates for trailing messages.

That split is practical. Flue reacts to real provider usage when deciding whether the window is under pressure, then uses a conservative estimate when choosing what to summarize and what to keep.

Cut Points

prepareCompaction(...) is pure: no model call, no store write, no session mutation. It finds what should be summarized and what should remain.

The cut-point rules are deliberately constrained:

RuleWhy
Valid cut points are user or assistant messages, never tool results.Keeping a tool result without its surrounding turn breaks conversation shape.
Recent tail is kept based on keepRecentTokens.The model needs fresh work, not only a high-level summary.
Prior compaction details can be carried forward.Repeated compaction should preserve known file-operation context.
Split turns get a separate prefix summary.If the cut falls inside a turn, the retained suffix still needs early-turn context.

There is no deriveCompactionDefaults(...) in the pinned source. The real exported functions include calculateContextTokens(...), estimateContextTokens(...), shouldCompact(...), prepareCompaction(...), and compact(...).

Summary Generation

The summarization prompt is structured. It asks for goal, constraints, progress, decisions, next steps, and critical context. The serializer also extracts file operations from assistant tool calls and appends read/modified file lists to the resulting summary.

That is important for a coding harness. A summary that says “worked on the parser” is weaker than a summary that preserves exact file paths and changed-file context.

The split-turn branch can make two internal model calls: one for previous history and one for the prefix of the current turn. Regular compaction makes one. compact(...) aggregates usage from those calls and returns it in the CompactionResult.

Appending The Compaction Entry

Session.runCompaction(...) takes the preparation result and calls compact(...). If the result succeeds, it appends a CompactionEntry with:

FieldMeaning
summaryStructured checkpoint text.
firstKeptEntryIdEntry where retained active-path detail resumes.
tokensBeforeEstimated context size before compaction.
detailsRead and modified file lists.
usageCost of summarization calls, if reported.

Then it rebuilds this.harness.state.messages from this.history.buildContext(). This is the moment old active-path detail is replaced by a summary for future model calls.

Overflow Recovery

Overflow recovery starts in Session.checkCompaction(...). If isContextOverflow(assistantMessage, contextWindow) returns true, the session does three things before compaction:

  1. Prevents recursive recovery with overflowRecoveryAttempted.
  2. Removes the failed assistant message from harness.state.messages.
  3. Removes the matching failed assistant leaf from SessionHistory.

Then runCompaction('overflow', true) compacts and retries. After appending the compaction entry and rebuilding context, it calls this.harness.continue(), waits for idle, and syncs retry messages back into history.

The session tree is what makes this explainable. The failed assistant turn can be removed as the leaf, then a compaction entry can be appended, then the retry can continue from the compacted path.

Usage Accounting

Compaction costs tokens. Hiding that cost would make the triggering call look artificially cheap.

The pinned source solves this by persisting summarization usage on the compaction entry and folding it into aggregateUsageSince(beforeLeafId). That aggregator walks active-path entries appended after the public call started and adds:

  • provider usage from assistant messages
  • compaction usage from compaction entries

That is why beforeLeafId matters. The public call samples the leaf before work starts. After prompt, retry, and compaction mutations, usage is calculated over the durable active path, not over a volatile array length.

Tuning Guidance

The defaults are conservative for large-context coding agents. If you tune them, keep the two budgets separate:

SettingTradeoff
reserveTokensLarger reserve triggers earlier compaction and leaves more headroom for provider output.
keepRecentTokensLarger recent tail preserves detail but leaves less context reduction.
enabledTurning it off removes threshold compaction but not the need to handle provider overflow elsewhere.

The implementation is designed so threshold compaction can be skipped when model window is unknown, but overflow recovery still reacts to real provider failure. That is the right asymmetry.

What Breaks If This Boundary Drifts

DriftFailure
Compaction explained as a prompt trickReaders miss cut points, session entries, usage, and retry behavior.
Cut at a tool resultThe model sees a result without the tool call that caused it.
Failed overflow leaf is not removedRetry starts from a poisoned context that already contains the failed turn.
Compaction usage is not countedUsers under-report the true cost of long-running calls.
Prior compaction details are droppedRepeated summaries lose file-operation continuity.

What To Copy

The copyable pattern is to keep compaction as a pure preparation step plus an explicit persisted transition. Prepare without side effects, summarize with a bounded prompt, append a summary entry, rebuild context, and only then retry if recovery requires it.

That pattern is much easier to test than “occasionally rewrite the transcript.”

Verify In Source

  • compaction.ts exports calculateContextTokens(...), estimateContextTokens(...), shouldCompact(...), prepareCompaction(...), and compact(...).
  • prepareCompaction(...) finds valid cut points and avoids tool-result cut points.
  • compact(...) aggregates usage from one or two summarization calls.
  • Session.checkCompaction(...) handles threshold and overflow separately.
  • Session.runCompaction(...) emits compaction_start and compaction events.
  • Overflow recovery removes the failed assistant leaf before appending a compaction entry and retrying.
  • aggregateUsageSince(...) includes compaction entry usage.

References

Flue-framework Ch 4/8
  1. 1 Runtime Map 24m
  2. 2 Session Tree, Leaf, And Replay Safety 26m
  3. 3 The Pi-ai Seam 22m
  4. 4 Compaction As Failure Recovery 28m
  5. 5 Tool Contracts And Sandbox Reality 25m
  6. 6 Runs, Registries, Logs, And APIs 27m
  7. 7 Build Targets And Deployment Shape 26m
  8. 8 Extending Flue Safely 24m