Tool Contracts And Sandbox Reality | Intentional / Deliberate / Engineering

A tool schema is not just documentation for the model. It is a promise. If the schema says the model can request a timeout, the runtime has to carry that timeout far enough that the command actually observes it.

This chapter reads the boundary where Flue turns model-visible tools into host-specific behavior: packages/runtime/src/agent.ts, packages/runtime/src/sandbox.ts, and the public types in types.ts.

The source pin for this chapter is withastro/flue@dbaa9effa305561c627c6836559f8a0cbce67875.

Tool schema promise crossing into SessionEnv and SandboxApi runtime capability with timeout propagation. — Schema Promise Meets Runtime Capability

Domain Word

In Flue, a tool contract is the schema plus the runtime promise behind it. Session environment is the capability boundary the tool uses. Sandbox API is the adapter shape for local, remote, or platform-specific execution.

The invariant is: if the model-visible schema advertises a capability, Flue must either enforce it or fail in a shape the model and caller can understand.

The twelve-factor pressure is backing services. Sandboxes, filesystems, shells, and remote execution providers are attached resources. The harness should depend on a capability interface, not on one host implementation.

The Contract Crosses A Boundary

TOOL CONTRACT PATH

model sees tool schema
│
▼
agent.ts built-in tool
│ validates params and chooses runtime behavior
▼
SessionEnv
│ exec/read/write/stat/list capability boundary
▼
sandbox.ts adapter
│ just-bash, cwd wrapper, custom SandboxApi, platform adapter
▼
host execution
│
▼
tool result returned to model and run events

The model never sees SandboxApi. It sees tools. The adapter never sees the model’s reasoning. It sees capability calls. Flue sits in the middle and has to preserve the contract both ways.

Built-In Tools

At the pinned source, BUILTIN_TOOL_NAMES contains:

read
write
edit
bash
grep
glob
task

createTools(...) builds those tools from a SessionEnv. Most tools are thin wrappers around file or shell capabilities:

Tool	Runtime path
`read`	`env.stat(...)`; if the path is a directory, `env.readdir(...)`; otherwise `env.readFile(...)`
`write`	normalizes the path, then `env.writeFile(...)`
`edit`	`env.readFile(...)`, string replacement, then `env.writeFile(...)`
`bash`	`env.exec(...)` with timeout and abort handling
`grep`	shell command through `env.exec(...)`
`glob`	shell command through `env.exec(...)`
`task`	framework-owned child session delegation

Custom tools are also allowed, but Session.validateCustomToolNames(...) rejects names that collide with built-ins or duplicate another custom tool. Tool names are part of the model contract; ambiguity there becomes runtime confusion.

Bash Timeout Is The Canonical Bug

The bash tool is the best source-level example because it crosses multiple layers.

The tool schema has an optional timeout number. createBashTool(...) then enforces it in two ways:

Pass timeout into env.exec(...) as a provider-native hint.
Compose a local AbortSignal.timeout(...) with the incoming signal.

If the timeout fires, the LLM-facing tool returns a recoverable shell-shaped result with exit code 124 and a timeout message. That behavior lives in the tool layer because the model needs a tool result it can reason about. Programmatic callers can use AbortSignal.timeout(...) and receive ordinary cancellation behavior.

This is the lesson from PR #25: a schema field that does not reach the runtime is a broken promise.

`SessionEnv` Is The Capability Boundary

types.ts defines SessionEnv. It includes:

exec(...)
readFile(...)
readFileBuffer(...)
writeFile(...)
stat(...)
readdir(...)
exists(...)

SessionEnv.exec(...) accepts env, cwd, signal, and timeout. The comments are unusually important: timeout is the primary cancellation contract for sandbox connectors because many remote providers expose native timeout options while fewer support mid-flight abort signals. signal is still valuable for local and in-process implementations.

That dual channel is what lets Flue support different hosts without weakening the tool contract to the least capable provider.

`SandboxApi` Is The Adapter Surface

sandbox.ts defines SandboxApi for external sandbox instances. createSandboxSessionEnv(api, cwd) wraps that API into a SessionEnv, resolving relative paths against cwd and forwarding execution options.

There are also helper adapters:

Adapter	Purpose
`createFlueFs(env)`	Exposes file operations as the public `FlueFs` surface.
`createCwdSessionEnv(parentEnv, cwd)`	Creates a cwd-scoped child environment for task sessions.
`bashFactoryToSessionEnv(...)`	Adapts just-bash factories.
`createSandboxSessionEnv(...)`	Adapts external sandbox APIs.

The adapter layer is where host differences belong. A custom sandbox can use a remote container, a Durable Object, or a local shell, but the tools should continue to call SessionEnv.

`task` Is A Framework-Owned Tool

The task tool is different from read or bash. It does not just call a host capability. It asks Flue to create a child agent session.

That makes it a harness feature. The tool describes delegation to the model, but Session.runTaskForTool(...) and runTask(...) own task IDs, role inheritance, cwd overrides, depth limits, cancellation, child session creation, event emission, and result shaping.

This is why “tool” should not mean “function.” Some tools are adapters to backing services. Others are framework behaviors exposed through the model’s tool interface.

Connector Author Checklist

If you write a sandbox or connector for Flue, the contract is practical:

Concern	Requirement
Timeout	Forward `timeout` to the provider’s native timeout when possible.
Abort	Honor `signal` where the provider supports cancellation.
Paths	Resolve paths consistently against cwd.
Errors	Return or throw errors in a shape callers can diagnose.
Output	Keep stdout/stderr/exit code semantics stable for shell results.
Files	Keep text and buffer file operations separate.

The goal is not to make every sandbox identical. The goal is to make host differences stay behind SessionEnv instead of leaking into tool prose or user agent code.

What Breaks If This Boundary Drifts

Drift	Failure
Tool schema is treated as docs only	The model plans around capabilities that do not exist.
Timeout stays in `bash` params only	Remote commands run past the advertised deadline.
Sandbox adapters expose host quirks directly	Agent code becomes target-specific.
Custom tool names collide with built-ins	The model cannot know which behavior a name means.
`task` is implemented as inline prompting	Delegation loses session identity, cleanup, and context isolation.

What To Copy

The copyable pattern is contract continuity. Put the schema near the tool, put capabilities behind a narrow environment interface, and make adapters responsible for host-specific translation.

When a tool promise crosses process, container, provider, or platform boundaries, test the promise end to end. Unit tests around schema parsing are not enough.

Verify In Source

agent.ts defines BUILTIN_TOOL_NAMES and createTools(...).
createBashTool(...) passes timeout to env.exec(...) and composes a timeout signal.
types.ts documents SessionEnv.exec(...) timeout as the primary connector cancellation contract.
sandbox.ts defines SandboxApi and createSandboxSessionEnv(...).
createCwdSessionEnv(...) forwards cwd and timeout into the parent env.
Session.validateCustomToolNames(...) rejects built-in collisions and duplicate custom tools.
Session.runTaskForTool(...) routes the task tool through child session creation.