I/D/E · flue-framework

Tool Contracts And Sandbox Reality

Summary

How Flue turns model-visible tool schemas into runtime behavior across SessionEnv, SandboxApi, and host adapters.

A tool schema is not just documentation for the model. It is a promise. If the schema says the model can request a timeout, the runtime has to carry that timeout far enough that the command actually observes it.

This chapter reads the boundary where Flue turns model-visible tools into host-specific behavior: packages/runtime/src/agent.ts, packages/runtime/src/sandbox.ts, and the public types in types.ts.

The source pin for this chapter is withastro/flue@dbaa9effa305561c627c6836559f8a0cbce67875.

Schema Promise Meets Runtime Capability

A tool contract is only real when schema, validation, and host execution preserve the same promise.

Domain Word

In Flue, a tool contract is the schema plus the runtime promise behind it. Session environment is the capability boundary the tool uses. Sandbox API is the adapter shape for local, remote, or platform-specific execution.

The invariant is: if the model-visible schema advertises a capability, Flue must either enforce it or fail in a shape the model and caller can understand.

The twelve-factor pressure is backing services. Sandboxes, filesystems, shells, and remote execution providers are attached resources. The harness should depend on a capability interface, not on one host implementation.

The Contract Crosses A Boundary

TOOL CONTRACT PATH
model sees tool schema


agent.ts built-in tool
 validates params and chooses runtime behavior

SessionEnv
 exec/read/write/stat/list capability boundary

sandbox.ts adapter
 just-bash, cwd wrapper, custom SandboxApi, platform adapter

host execution


tool result returned to model and run events

The model never sees SandboxApi. It sees tools. The adapter never sees the model’s reasoning. It sees capability calls. Flue sits in the middle and has to preserve the contract both ways.

Built-In Tools

At the pinned source, BUILTIN_TOOL_NAMES contains:

read
write
edit
bash
grep
glob
task

createTools(...) builds those tools from a SessionEnv. Most tools are thin wrappers around file or shell capabilities:

ToolRuntime path
readenv.stat(...); if the path is a directory, env.readdir(...); otherwise env.readFile(...)
writenormalizes the path, then env.writeFile(...)
editenv.readFile(...), string replacement, then env.writeFile(...)
bashenv.exec(...) with timeout and abort handling
grepshell command through env.exec(...)
globshell command through env.exec(...)
taskframework-owned child session delegation

Custom tools are also allowed, but Session.validateCustomToolNames(...) rejects names that collide with built-ins or duplicate another custom tool. Tool names are part of the model contract; ambiguity there becomes runtime confusion.

Bash Timeout Is The Canonical Bug

The bash tool is the best source-level example because it crosses multiple layers.

The tool schema has an optional timeout number. createBashTool(...) then enforces it in two ways:

  1. Pass timeout into env.exec(...) as a provider-native hint.
  2. Compose a local AbortSignal.timeout(...) with the incoming signal.

If the timeout fires, the LLM-facing tool returns a recoverable shell-shaped result with exit code 124 and a timeout message. That behavior lives in the tool layer because the model needs a tool result it can reason about. Programmatic callers can use AbortSignal.timeout(...) and receive ordinary cancellation behavior.

This is the lesson from PR #25: a schema field that does not reach the runtime is a broken promise.

SessionEnv Is The Capability Boundary

types.ts defines SessionEnv. It includes:

  • exec(...)
  • readFile(...)
  • readFileBuffer(...)
  • writeFile(...)
  • stat(...)
  • readdir(...)
  • exists(...)

SessionEnv.exec(...) accepts env, cwd, signal, and timeout. The comments are unusually important: timeout is the primary cancellation contract for sandbox connectors because many remote providers expose native timeout options while fewer support mid-flight abort signals. signal is still valuable for local and in-process implementations.

That dual channel is what lets Flue support different hosts without weakening the tool contract to the least capable provider.

SandboxApi Is The Adapter Surface

sandbox.ts defines SandboxApi for external sandbox instances. createSandboxSessionEnv(api, cwd) wraps that API into a SessionEnv, resolving relative paths against cwd and forwarding execution options.

There are also helper adapters:

AdapterPurpose
createFlueFs(env)Exposes file operations as the public FlueFs surface.
createCwdSessionEnv(parentEnv, cwd)Creates a cwd-scoped child environment for task sessions.
bashFactoryToSessionEnv(...)Adapts just-bash factories.
createSandboxSessionEnv(...)Adapts external sandbox APIs.

The adapter layer is where host differences belong. A custom sandbox can use a remote container, a Durable Object, or a local shell, but the tools should continue to call SessionEnv.

task Is A Framework-Owned Tool

The task tool is different from read or bash. It does not just call a host capability. It asks Flue to create a child agent session.

That makes it a harness feature. The tool describes delegation to the model, but Session.runTaskForTool(...) and runTask(...) own task IDs, role inheritance, cwd overrides, depth limits, cancellation, child session creation, event emission, and result shaping.

This is why “tool” should not mean “function.” Some tools are adapters to backing services. Others are framework behaviors exposed through the model’s tool interface.

Connector Author Checklist

If you write a sandbox or connector for Flue, the contract is practical:

ConcernRequirement
TimeoutForward timeout to the provider’s native timeout when possible.
AbortHonor signal where the provider supports cancellation.
PathsResolve paths consistently against cwd.
ErrorsReturn or throw errors in a shape callers can diagnose.
OutputKeep stdout/stderr/exit code semantics stable for shell results.
FilesKeep text and buffer file operations separate.

The goal is not to make every sandbox identical. The goal is to make host differences stay behind SessionEnv instead of leaking into tool prose or user agent code.

What Breaks If This Boundary Drifts

DriftFailure
Tool schema is treated as docs onlyThe model plans around capabilities that do not exist.
Timeout stays in bash params onlyRemote commands run past the advertised deadline.
Sandbox adapters expose host quirks directlyAgent code becomes target-specific.
Custom tool names collide with built-insThe model cannot know which behavior a name means.
task is implemented as inline promptingDelegation loses session identity, cleanup, and context isolation.

What To Copy

The copyable pattern is contract continuity. Put the schema near the tool, put capabilities behind a narrow environment interface, and make adapters responsible for host-specific translation.

When a tool promise crosses process, container, provider, or platform boundaries, test the promise end to end. Unit tests around schema parsing are not enough.

Verify In Source

  • agent.ts defines BUILTIN_TOOL_NAMES and createTools(...).
  • createBashTool(...) passes timeout to env.exec(...) and composes a timeout signal.
  • types.ts documents SessionEnv.exec(...) timeout as the primary connector cancellation contract.
  • sandbox.ts defines SandboxApi and createSandboxSessionEnv(...).
  • createCwdSessionEnv(...) forwards cwd and timeout into the parent env.
  • Session.validateCustomToolNames(...) rejects built-in collisions and duplicate custom tools.
  • Session.runTaskForTool(...) routes the task tool through child session creation.

References

Flue-framework Ch 5/8
  1. 1 Runtime Map 24m
  2. 2 Session Tree, Leaf, And Replay Safety 26m
  3. 3 The Pi-ai Seam 22m
  4. 4 Compaction As Failure Recovery 28m
  5. 5 Tool Contracts And Sandbox Reality 25m
  6. 6 Runs, Registries, Logs, And APIs 27m
  7. 7 Build Targets And Deployment Shape 26m
  8. 8 Extending Flue Safely 24m