A tool schema is not just documentation for the model. It is a promise. If the schema says the model can request a timeout, the runtime has to carry that timeout far enough that the command actually observes it.
This chapter reads the boundary where Flue turns model-visible tools into host-specific behavior: packages/runtime/src/agent.ts, packages/runtime/src/sandbox.ts, and the public types in types.ts.
The source pin for this chapter is withastro/flue@dbaa9effa305561c627c6836559f8a0cbce67875.
A tool contract is only real when schema, validation, and host execution preserve the same promise.
Domain Word
In Flue, a tool contract is the schema plus the runtime promise behind it. Session environment is the capability boundary the tool uses. Sandbox API is the adapter shape for local, remote, or platform-specific execution.
The invariant is: if the model-visible schema advertises a capability, Flue must either enforce it or fail in a shape the model and caller can understand.
The twelve-factor pressure is backing services. Sandboxes, filesystems, shells, and remote execution providers are attached resources. The harness should depend on a capability interface, not on one host implementation.
The Contract Crosses A Boundary
model sees tool schema │ ▼ agent.ts built-in tool │ validates params and chooses runtime behavior ▼ SessionEnv │ exec/read/write/stat/list capability boundary ▼ sandbox.ts adapter │ just-bash, cwd wrapper, custom SandboxApi, platform adapter ▼ host execution │ ▼ tool result returned to model and run events
The model never sees SandboxApi. It sees tools. The adapter never sees the model’s reasoning. It sees capability calls. Flue sits in the middle and has to preserve the contract both ways.
Built-In Tools
At the pinned source, BUILTIN_TOOL_NAMES contains:
read
write
edit
bash
grep
glob
task
createTools(...) builds those tools from a SessionEnv. Most tools are thin wrappers around file or shell capabilities:
| Tool | Runtime path |
|---|---|
read | env.stat(...); if the path is a directory, env.readdir(...); otherwise env.readFile(...) |
write | normalizes the path, then env.writeFile(...) |
edit | env.readFile(...), string replacement, then env.writeFile(...) |
bash | env.exec(...) with timeout and abort handling |
grep | shell command through env.exec(...) |
glob | shell command through env.exec(...) |
task | framework-owned child session delegation |
Custom tools are also allowed, but Session.validateCustomToolNames(...) rejects names that collide with built-ins or duplicate another custom tool. Tool names are part of the model contract; ambiguity there becomes runtime confusion.
Bash Timeout Is The Canonical Bug
The bash tool is the best source-level example because it crosses multiple layers.
The tool schema has an optional timeout number. createBashTool(...) then enforces it in two ways:
- Pass
timeoutintoenv.exec(...)as a provider-native hint. - Compose a local
AbortSignal.timeout(...)with the incoming signal.
If the timeout fires, the LLM-facing tool returns a recoverable shell-shaped result with exit code 124 and a timeout message. That behavior lives in the tool layer because the model needs a tool result it can reason about. Programmatic callers can use AbortSignal.timeout(...) and receive ordinary cancellation behavior.
This is the lesson from PR #25: a schema field that does not reach the runtime is a broken promise.
SessionEnv Is The Capability Boundary
types.ts defines SessionEnv. It includes:
exec(...)readFile(...)readFileBuffer(...)writeFile(...)stat(...)readdir(...)exists(...)
SessionEnv.exec(...) accepts env, cwd, signal, and timeout. The comments are unusually important: timeout is the primary cancellation contract for sandbox connectors because many remote providers expose native timeout options while fewer support mid-flight abort signals. signal is still valuable for local and in-process implementations.
That dual channel is what lets Flue support different hosts without weakening the tool contract to the least capable provider.
SandboxApi Is The Adapter Surface
sandbox.ts defines SandboxApi for external sandbox instances. createSandboxSessionEnv(api, cwd) wraps that API into a SessionEnv, resolving relative paths against cwd and forwarding execution options.
There are also helper adapters:
| Adapter | Purpose |
|---|---|
createFlueFs(env) | Exposes file operations as the public FlueFs surface. |
createCwdSessionEnv(parentEnv, cwd) | Creates a cwd-scoped child environment for task sessions. |
bashFactoryToSessionEnv(...) | Adapts just-bash factories. |
createSandboxSessionEnv(...) | Adapts external sandbox APIs. |
The adapter layer is where host differences belong. A custom sandbox can use a remote container, a Durable Object, or a local shell, but the tools should continue to call SessionEnv.
task Is A Framework-Owned Tool
The task tool is different from read or bash. It does not just call a host capability. It asks Flue to create a child agent session.
That makes it a harness feature. The tool describes delegation to the model, but Session.runTaskForTool(...) and runTask(...) own task IDs, role inheritance, cwd overrides, depth limits, cancellation, child session creation, event emission, and result shaping.
This is why “tool” should not mean “function.” Some tools are adapters to backing services. Others are framework behaviors exposed through the model’s tool interface.
Connector Author Checklist
If you write a sandbox or connector for Flue, the contract is practical:
| Concern | Requirement |
|---|---|
| Timeout | Forward timeout to the provider’s native timeout when possible. |
| Abort | Honor signal where the provider supports cancellation. |
| Paths | Resolve paths consistently against cwd. |
| Errors | Return or throw errors in a shape callers can diagnose. |
| Output | Keep stdout/stderr/exit code semantics stable for shell results. |
| Files | Keep text and buffer file operations separate. |
The goal is not to make every sandbox identical. The goal is to make host differences stay behind SessionEnv instead of leaking into tool prose or user agent code.
What Breaks If This Boundary Drifts
| Drift | Failure |
|---|---|
| Tool schema is treated as docs only | The model plans around capabilities that do not exist. |
Timeout stays in bash params only | Remote commands run past the advertised deadline. |
| Sandbox adapters expose host quirks directly | Agent code becomes target-specific. |
| Custom tool names collide with built-ins | The model cannot know which behavior a name means. |
task is implemented as inline prompting | Delegation loses session identity, cleanup, and context isolation. |
What To Copy
The copyable pattern is contract continuity. Put the schema near the tool, put capabilities behind a narrow environment interface, and make adapters responsible for host-specific translation.
When a tool promise crosses process, container, provider, or platform boundaries, test the promise end to end. Unit tests around schema parsing are not enough.
Verify In Source
agent.tsdefinesBUILTIN_TOOL_NAMESandcreateTools(...).createBashTool(...)passestimeouttoenv.exec(...)and composes a timeout signal.types.tsdocumentsSessionEnv.exec(...)timeout as the primary connector cancellation contract.sandbox.tsdefinesSandboxApiandcreateSandboxSessionEnv(...).createCwdSessionEnv(...)forwards cwd and timeout into the parent env.Session.validateCustomToolNames(...)rejects built-in collisions and duplicate custom tools.Session.runTaskForTool(...)routes thetasktool through child session creation.