openrouter:subagent server tool.
Outcome: Your app sends complex tasks to an orchestrator that automatically delegates focused work (summarization, extraction, reformatting, drafting) to a cheap worker, cutting token cost on bulk generation while keeping planning quality high.
Before you start
You need:- Node.js 20 or newer
- An OpenRouter API key in
OPENROUTER_API_KEY - A workflow that already calls OpenRouter (Chat Completions, Responses, or Agent SDK)
- An orchestrator model that supports tool calling (e.g.
~anthropic/claude-opus-latest). Check the model’s capabilities on the model page before choosing. - A cheaper worker model for subtasks (e.g.
~anthropic/claude-haiku-latest). Browse model pricing to find the cheapest model that meets your quality bar.
~anthropic/claude-haiku-latest auto-resolve to the newest version in that model family. Find available aliases on each model’s page at /models. You can also use exact slugs (e.g. anthropic/claude-haiku-4.5) when you need to pin a specific version.
If you’re starting a new TypeScript agent, use the Agent SDK callModel API for the orchestrator loop. The samples below use Chat Completions so the server-tool request shape is visible, but the delegation pattern works the same way inside an Agent SDK workflow.
Use these references for exact schemas:
- Subagent server tool
- Agent SDK
callModeloverview - Create a chat completion
- Create a response
- TypeScript SDK Chat reference
What you’re building
This recipe adds a task-delegation layer to a multi-step workflow. The orchestrator model receives a complex request and decides how to break it apart. For each piece that doesn’t need its full capability, it callsopenrouter:subagent with a task_name and a task_description. The worker executes the task and returns its outcome. The orchestrator integrates all outcomes into the final response.
1. Add the subagent tool to your request
The minimal setup: oneopenrouter:subagent entry in the tools array with the worker model pinned in parameters.
message.content. The usage object reflects the combined token spend per Server tools: Usage Tracking.
The orchestrator decides whether and when to delegate. Each delegation passes two arguments:
task_name: a short label (e.g.summarize-breaking-changes)task_description: everything the worker needs, including all context, inputs, and the expected output format
task_description. It has no access to the parent conversation, so the orchestrator must be explicit about what it wants back.
2. Read the tool result
On success, the subagent returns a JSON result with the worker’s output:3. Give the worker its own tools
When a subtask needs external data, pass server tools to the worker. The worker runs as a mini agent over those tools before producing its outcome.- Only OpenRouter server tools work (e.g.
openrouter:web_search,openrouter:web_fetch,openrouter:datetime). Function tools are rejected with a400because the worker has no client-side executor. - The subagent tool can’t list itself. Recursion guards prevent the worker from re-entering the subagent.
4. Tune the worker for cost and quality
The subagent’sparameters let you control how the worker generates. Use them to keep cost predictable.
| Parameter | What it controls |
|---|---|
model | The worker model. Pick the cheapest model that can handle the subtask quality you need. |
max_completion_tokens | Output token ceiling (including reasoning). Prevents runaway generation on open-ended tasks. |
temperature | Lower values for deterministic extraction, higher for creative drafting. Range 0 to 2. |
reasoning | effort controls reasoning depth. Set to "low" for fast, cheap tasks. max_tokens is accepted and validated but not yet forwarded to the worker. |
instructions | System prompt for the worker. Shape its output format and behavior. |
max_tool_calls | Range 1 to 25. Accepted and validated but not yet enforced on the worker call. Plan for enforcement when relying on it as a cost guard. |
Subagent works with both non-streaming and streaming requests. With streaming
(
stream: true), the server sends : OPENROUTER PROCESSING SSE comments as
heartbeats while workers execute. Content chunks resume once the orchestrator
continues generating. The final chunk includes the aggregated usage object.
See Server tools overview for how server
tool usage appears in the response.stream: true, expect this pattern in the SSE stream:
:) automatically. Here’s a minimal consumer:
5. Log delegation routing, not task content
Add telemetry where your app already records model calls. Log the routing decision and cost, not the content. Log:orchestrator_modelworker_modeldid_enable_delegation(whether you configured the subagent tool on this request)finish_reasonusage.prompt_tokens(orusage.input_tokens),usage.completion_tokens(orusage.output_tokens),usage.total_tokens, andusage.costwhen returned- route or feature name, such as
delegated_analysis
- API keys
- cookies
- full task descriptions
- full worker outcomes
- user content (unless your product already has an explicit retention policy)
usage object in the response reflects the combined token spend of the orchestrator plus all worker calls, per Server tools: Usage Tracking. You don’t need to track inner costs separately.
Next steps
- Read the Subagent reference for exact parameters, recursion guards, worker tool constraints, and invocation caps.
- Pair subagent with Advisor for a two-tier pattern: cheap worker for routine tasks, strong advisor for uncertain decisions.
- Give the worker Web Search when subtasks need current data.
- Add Response Caching for repeated orchestrator prefixes across similar tasks.
- Use Fusion when subtasks need multi-model deliberation instead of single-worker execution.
- Browse the Model list to compare worker model pricing and find the cheapest model that meets your subtask quality bar.
- Add Structured Outputs to the orchestrator request when you need the final answer in a specific JSON schema.