Compaction is easy to under-teach. If you describe it as “summarize old messages when context gets long,” you miss the architecture.
In Flue, compaction is a runtime state transition. It measures context, chooses a valid cut point, serializes older conversation, asks the model for a structured checkpoint, appends a compaction entry to the session tree, records usage, and sometimes retries after a provider context overflow.
The source pin for this chapter is withastro/flue@dbaa9effa305561c627c6836559f8a0cbce67875.
Compaction is only safe when the chosen cut preserves a runnable suffix.
Domain Word
In Flue, compaction is a state transition that replaces old active-path detail with a summary entry while preserving enough recent context for the session to continue.
The invariant is: context reduction must keep the conversation runnable and must not erase the accounting and recovery facts the harness needs.
The twelve-factor pressure is disposability. Long-running headless agents need recovery behavior when context pressure or provider overflow would otherwise kill the run.
The State Machine
normal prompt turn │ ▼ assistant message arrives │ ├─ stopReason is context overflow │ └─ remove failed assistant leaf │ -> runCompaction("overflow", willRetry=true) │ -> append compaction entry │ -> rebuild context │ -> continue rented Agent │ └─ usage near context limit └─ runCompaction("threshold", willRetry=false) -> append compaction entry -> rebuild context
The threshold path is proactive. The overflow path is recovery. The second path is the one that makes compaction a reliability feature, not just a cost feature.
Settings And Measurement
packages/runtime/src/compaction.ts defines:
export const DEFAULT_COMPACTION_SETTINGS = {
enabled: true,
reserveTokens: 16384,
keepRecentTokens: 20000,
};
The threshold trigger is intentionally tied to provider-reported usage from the latest assistant message. Session.checkCompaction(...) calls calculateContextTokens(assistantMessage.usage) and passes that value to shouldCompact(...), which checks whether it exceeds contextWindow - reserveTokens.
If the model has no known contextWindow, threshold compaction is skipped. Overflow recovery can still run because it is driven by the provider’s error signal.
Token measurement has two jobs:
| Job | Source |
|---|---|
| Threshold trigger | Provider usage on the latest assistant message. |
Cut planning and tokensBefore | estimateContextTokens(...), which combines provider usage with conservative character estimates for trailing messages. |
That split is practical. Flue reacts to real provider usage when deciding whether the window is under pressure, then uses a conservative estimate when choosing what to summarize and what to keep.
Cut Points
prepareCompaction(...) is pure: no model call, no store write, no session mutation. It finds what should be summarized and what should remain.
The cut-point rules are deliberately constrained:
| Rule | Why |
|---|---|
| Valid cut points are user or assistant messages, never tool results. | Keeping a tool result without its surrounding turn breaks conversation shape. |
Recent tail is kept based on keepRecentTokens. | The model needs fresh work, not only a high-level summary. |
| Prior compaction details can be carried forward. | Repeated compaction should preserve known file-operation context. |
| Split turns get a separate prefix summary. | If the cut falls inside a turn, the retained suffix still needs early-turn context. |
There is no deriveCompactionDefaults(...) in the pinned source. The real exported functions include calculateContextTokens(...), estimateContextTokens(...), shouldCompact(...), prepareCompaction(...), and compact(...).
Summary Generation
The summarization prompt is structured. It asks for goal, constraints, progress, decisions, next steps, and critical context. The serializer also extracts file operations from assistant tool calls and appends read/modified file lists to the resulting summary.
That is important for a coding harness. A summary that says “worked on the parser” is weaker than a summary that preserves exact file paths and changed-file context.
The split-turn branch can make two internal model calls: one for previous history and one for the prefix of the current turn. Regular compaction makes one. compact(...) aggregates usage from those calls and returns it in the CompactionResult.
Appending The Compaction Entry
Session.runCompaction(...) takes the preparation result and calls compact(...). If the result succeeds, it appends a CompactionEntry with:
| Field | Meaning |
|---|---|
summary | Structured checkpoint text. |
firstKeptEntryId | Entry where retained active-path detail resumes. |
tokensBefore | Estimated context size before compaction. |
details | Read and modified file lists. |
usage | Cost of summarization calls, if reported. |
Then it rebuilds this.harness.state.messages from this.history.buildContext(). This is the moment old active-path detail is replaced by a summary for future model calls.
Overflow Recovery
Overflow recovery starts in Session.checkCompaction(...). If isContextOverflow(assistantMessage, contextWindow) returns true, the session does three things before compaction:
- Prevents recursive recovery with
overflowRecoveryAttempted. - Removes the failed assistant message from
harness.state.messages. - Removes the matching failed assistant leaf from
SessionHistory.
Then runCompaction('overflow', true) compacts and retries. After appending the compaction entry and rebuilding context, it calls this.harness.continue(), waits for idle, and syncs retry messages back into history.
The session tree is what makes this explainable. The failed assistant turn can be removed as the leaf, then a compaction entry can be appended, then the retry can continue from the compacted path.
Usage Accounting
Compaction costs tokens. Hiding that cost would make the triggering call look artificially cheap.
The pinned source solves this by persisting summarization usage on the compaction entry and folding it into aggregateUsageSince(beforeLeafId). That aggregator walks active-path entries appended after the public call started and adds:
- provider usage from assistant messages
- compaction usage from compaction entries
That is why beforeLeafId matters. The public call samples the leaf before work starts. After prompt, retry, and compaction mutations, usage is calculated over the durable active path, not over a volatile array length.
Tuning Guidance
The defaults are conservative for large-context coding agents. If you tune them, keep the two budgets separate:
| Setting | Tradeoff |
|---|---|
reserveTokens | Larger reserve triggers earlier compaction and leaves more headroom for provider output. |
keepRecentTokens | Larger recent tail preserves detail but leaves less context reduction. |
enabled | Turning it off removes threshold compaction but not the need to handle provider overflow elsewhere. |
The implementation is designed so threshold compaction can be skipped when model window is unknown, but overflow recovery still reacts to real provider failure. That is the right asymmetry.
What Breaks If This Boundary Drifts
| Drift | Failure |
|---|---|
| Compaction explained as a prompt trick | Readers miss cut points, session entries, usage, and retry behavior. |
| Cut at a tool result | The model sees a result without the tool call that caused it. |
| Failed overflow leaf is not removed | Retry starts from a poisoned context that already contains the failed turn. |
| Compaction usage is not counted | Users under-report the true cost of long-running calls. |
| Prior compaction details are dropped | Repeated summaries lose file-operation continuity. |
What To Copy
The copyable pattern is to keep compaction as a pure preparation step plus an explicit persisted transition. Prepare without side effects, summarize with a bounded prompt, append a summary entry, rebuild context, and only then retry if recovery requires it.
That pattern is much easier to test than “occasionally rewrite the transcript.”
Verify In Source
compaction.tsexportscalculateContextTokens(...),estimateContextTokens(...),shouldCompact(...),prepareCompaction(...), andcompact(...).prepareCompaction(...)finds valid cut points and avoids tool-result cut points.compact(...)aggregates usage from one or two summarization calls.Session.checkCompaction(...)handles threshold and overflow separately.Session.runCompaction(...)emitscompaction_startandcompactionevents.- Overflow recovery removes the failed assistant leaf before appending a compaction entry and retrying.
aggregateUsageSince(...)includes compaction entry usage.