Build your agent

Tools & rich messages

Tools are server-side helpers your agent can invoke mid-turn. The visitor types something, the LLM decides "I need to call a tool for this", the tool runs on the server, and its result either feeds back into the LLM's final answer or surfaces directly in the widget as a structured block (e.g. a "Connect me with a human" button).

Tools are gated by the agent's vertical capabilities. Each tool declares which capability it requires; the registry only exposes a tool to agents whose capability list contains that slug. Admins can further narrow the list with vertical_overrides.enabled_tools.

The shipping tool

Tool	Required capability	Verticals	What it does
`escalate_to_human`	`ticket_escalation`	help_center	Surfaces a "Connect me with a human" button (block type `escalation_button`) and a result the LLM can incorporate. Click triggers the existing lead-capture flow so an operator can claim the conversation.

More tools land as later phases ship — lookup_product, order_status, find_in_docs, book_demo, and so on. Each tool is a single PHP class implementing App\Services\Tools\Contracts\Tool.

How the hot path resolves tool calls

For every visitor turn on a tool-enabled agent, MessageStreamController runs a small tool-resolution loop before the streaming final answer:

Build the OpenAI-style tools array from the registry's forAgent($agent) result.
Call llm->chatWithTools(messages, tools) non-streaming. The model either returns tool_calls (it wants to invoke one or more tools) or content (it's ready to answer).
If tool_calls: emit a tool_call SSE event for each invocation, run the tool's execute(), append the tool result to the message history as a {role: 'tool'} message, and loop.
Once the model returns content (or after 3 hops, whichever comes first), fall through to the existing streamChat path. The visitor still gets token-by-token streaming for the final answer, so TTFT is preserved.
Any block payloads the tools produced (e.g. escalation_button) are emitted as block SSE events for the widget to render inline.

Provider compatibility

Provider	Tool calling	Notes
OpenAI (gpt-4o-mini, gpt-4o)	Native	Full OpenAI `tools` array support via the SDK.
OpenRouter	Model-dependent	Tool-capable models (Claude 3.5, Llama 3.3 70B Hermes, etc.) work via the same OpenAI-compatible surface.
Cloudflare Workers AI	Model-dependent	Llama 3.3 70B Hermes and a handful of other models support function calling. Models without tool support gracefully degrade — they'll ignore the tools array and return content directly, so the loop simply exits.

Widget rendering of blocks

The widget receives block SSE events during a turn and attaches each block to the in-flight assistant message. The renderer registry in resources/widget/src/ui/blocks.tsx maps block type → Preact component. Unknown block types are silently dropped (forward-compat for newer servers).

The widget's canRender(capability, agent) helper now returns true when:

the bundle ships a renderer for that capability, AND
the agent's server-resolved capabilities array opted in.

Both are required — the widget never enables a capability the server didn't authorize, and never tries to render a block whose renderer isn't in the bundle.

Adding a new tool

Implement App\Services\Tools\Contracts\Tool in app/Services/Tools/Tools/YourTool.php. Pick a unique name(), write a clear description() (the LLM uses it to decide when to invoke), declare the capability() slug it requires, and define the schema() JSON.
Register the tool in ToolRegistry::__construct.
If your tool's execute() returns a block payload, ship a renderer for it in ui/blocks.tsx and add the relevant capability slug to the RENDERABLE set in capabilities.ts.
Add a Pest unit test for the tool and a feature test for the end-to-end flow using FakeOpenAi::pushToolCall.

Latency considerations

The tool loop adds one non-streaming round-trip per hop before the streaming final answer kicks in. For a typical "needs one tool" turn that's roughly +200–500 ms of latency before the visitor sees the first token. The 99% case (no tools used) is unchanged because the registry returns an empty tool list for agents whose capabilities don't match any registered tool.

To keep latency manageable, write tool descriptions tightly so the LLM only invokes a tool when it really needs one. The hop limit (3) is a safety net — well-written tool descriptions should converge in 1 hop.