Run multi-hour agent loops with cost ceilings, resumable state, and voice input
This cookbook assumes you have an OpenRouter API key and are using the Agent
SDK (@openrouter/agent). If you are starting from scratch, read the
Agent SDK overview and the
callModel reference first.
Goal: Run an agent that can keep working for hours, not seconds — research
projects, multi-stage migrations, voice-driven assistants, or background jobs
that span days. The same callModel loop works for all of them once you wire
up four primitives.Outcome: A long-horizon agent that:
Caps total cost and step count so it always terminates.
Persists conversation state so it can be resumed after a crash, deploy, or
human approval.
Streams progress events so dashboards and UIs stay live during the run.
Runs a self-ask loop — research, adversarial review, repeat — until the
agent emits a [DONE] sentinel.
You can hand this page to your coding agent as the implementation brief.
Adapt the storage, ceilings, and surface (CLI, API, queue worker) to your app
rather than scaffold a separate project.
Long-horizon agents must terminate. Combine multiple stop conditions so the
loop ends as soon as the first one fires. The most useful for long runs are
maxCost, stepCountIs, and maxTokensUsed.
import { OpenRouter, tool, stepCountIs, maxCost } from '@openrouter/agent';import { z } from 'zod';const openrouter = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY,});const searchTool = tool({ name: 'search', description: 'Search the web for information', inputSchema: z.object({ query: z.string() }), execute: async ({ query }) => { return { results: await fetchResults(query) }; },});const result = openrouter.callModel({ model: '~anthropic/claude-opus-latest', input: 'Research the fusion energy landscape and produce a 5-page report.', tools: [searchTool], // Stop on whichever fires first. stopWhen: [stepCountIs(200), maxCost(5)],});const text = await result.getText();
See the Stop Conditions reference
for the full list (stepCountIs, hasToolCall, maxTokensUsed, maxCost,
finishReasonIs) and how to compose custom predicates.
Long-horizon runs spend real credits. Always set both a step ceiling and a
cost ceiling before you start a multi-hour run, and start small while you are
iterating.
A multi-hour run must survive restarts, deploys, and human approvals.
callModel accepts a StateAccessor that loads and saves
ConversationState between steps. Back it with whatever storage your app
already uses.
import type { ConversationState, StateAccessor } from '@openrouter/agent';import { readFile, rename, writeFile } from 'node:fs/promises';const fileStateAccessor = (path: string): StateAccessor => ({ load: async () => { // Only swallow ENOENT — real I/O or permission errors should surface // instead of silently restarting the agent from scratch. const raw = await readFile(path, 'utf8').catch((err: NodeJS.ErrnoException) => { if (err.code === 'ENOENT') return null; throw err; }); return raw ? (JSON.parse(raw) as ConversationState) : null; }, // Atomic write: write to a temp file, then rename. POSIX rename is // atomic on the same filesystem, so a crash mid-write cannot leave // a truncated state file that breaks resumption. save: async (state) => { const tmp = `${path}.tmp`; await writeFile(tmp, JSON.stringify(state)); await rename(tmp, path); },});const result = openrouter.callModel({ model: '~anthropic/claude-opus-latest', input: 'Plan and start a 3-day data migration.', tools: [searchTool], state: fileStateAccessor('./run.json'), stopWhen: [stepCountIs(200), maxCost(5)],});await result.getResponse();
To resume after a crash, deploy, or human review, call callModel again with
the same StateAccessor. Pass input: [] to signal “no new user turn —
continue from saved state”; the SDK loads the checkpoint and keeps going.
For production, swap the file accessor for one backed by Postgres, Redis, or
an object store. See Tool Approval & State
for the full StateAccessor and resumption contract.
A run that lasts an hour should not block your UI for an hour. callModel
returns a result object with several streams you can consume independently:
result.getTextStream() — token deltas for the user-facing response.
result.getToolCallsStream() — tool calls as they complete.
result.getFullResponsesStream() — the full event stream, including tool
preliminary results.
result.getResponse() — the final, fully-resolved response with usage data.
const result = openrouter.callModel({ model: '~anthropic/claude-opus-latest', input: 'Build a market analysis report on EV charging.', tools: [searchTool], stopWhen: [stepCountIs(100), maxCost(2)],});// Stream tool calls and text deltas concurrently.const streamToolCalls = (async () => { for await (const call of result.getToolCallsStream()) { publishToDashboard({ kind: 'tool', name: call.name, args: call.arguments }); }})();const streamText = (async () => { for await (const delta of result.getTextStream()) { publishToDashboard({ kind: 'token', delta }); }})();await Promise.all([streamToolCalls, streamText]);const final = await result.getResponse();publishToDashboard({ kind: 'done', usage: final.usage });
See the callModel API reference
for every stream method and event type.Wire publishToDashboard to whatever transport you already use — Server-Sent
Events, WebSockets, a database table, or a pubsub channel.
A single pass through callModel often leaves gaps — unverified citations,
missing edge cases, or stale data. Wrap the run in an outer self-ask loop:
research, adversarial review, repeat until the agent emits a [DONE]
sentinel. Each iteration appends a new user turn to the persisted
StateAccessor, so the agent builds on its prior work instead of starting
over.
import { OpenRouter, stepCountIs, maxCost } from '@openrouter/agent';const openrouter = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY,});const SELF_ASK_MAX_ITERATIONS = 10;const REVIEW_PROMPT = `Review your last response adversarially.- Are there gaps, ambiguities, or unverified claims?- If the work is complete and every claim is verified, reply with only [DONE].- Otherwise list the gaps and keep researching.`;const state = fileStateAccessor('./run.json');let input: string | unknown[] = 'Research the fusion energy landscape and produce a 5-page report.';let final = '';for (let i = 0; i < SELF_ASK_MAX_ITERATIONS; i++) { const result = openrouter.callModel({ model: '~anthropic/claude-opus-latest', input, state, tools: [searchTool], // Per-iteration ceilings. The outer for-loop adds a third guard. stopWhen: [stepCountIs(50), maxCost(2)], }); final = await result.getText(); if (final.includes('[DONE]')) break; // Hand the assistant's own output back as an adversarial reviewer turn. input = REVIEW_PROMPT;}
The [DONE] sentinel is intentionally cheap: any model can produce it, and a
plain String.includes check keeps the control flow obvious. Swap the review
prompt or the reviewer model (for example a faster
~anthropic/claude-sonnet-latest critiquing an Opus researcher) without
changing the loop. Three layers of ceilings keep cost bounded:
SELF_ASK_MAX_ITERATIONS caps the number of review rounds, and each round
inherits its own stepCountIs + maxCost budget.
Pair this with the state accessor from step 2 so the loop survives crashes
mid-review. On resume, re-enter the loop from the saved state and continue
reviewing.
Drive the same agent loop from a voice memo, phone call, or push-to-talk app.
OpenRouter exposes a dedicated
/api/v1/audio/transcriptions
endpoint with a single STT model parameter. Hand the transcript to
callModel exactly like a text prompt.
import { OpenRouter as SDK } from '@openrouter/sdk';import { OpenRouter, stepCountIs, maxCost } from '@openrouter/agent';import { readFile } from 'node:fs/promises';const sdk = new SDK({ apiKey: process.env.OPENROUTER_API_KEY });const agent = new OpenRouter({ apiKey: process.env.OPENROUTER_API_KEY });const audio = await readFile('./voice-memo.wav');const transcription = await sdk.stt.createTranscription({ model: 'openai/whisper-1', inputAudio: { data: audio.toString('base64'), format: 'wav' },});const result = agent.callModel({ model: '~anthropic/claude-opus-latest', input: transcription.text, stopWhen: [stepCountIs(50), maxCost(2)],});const reply = await result.getText();
For a streaming microphone, capture audio chunks on the client, send them to
your server, and call createTranscription once silence is detected. Use the
STT cookbook for the full request and
response shape.
Long-horizon jobs usually run somewhere the user is not watching. Notify them
when the run terminates — by webhook, email, Slack message, or whatever your
stack uses. Trigger the notification once getResponse() resolves so the
agent has fully completed and ceilings have been honored.
const final = await result.getResponse();const webhookUrl = process.env.WEBHOOK_URL;if (!webhookUrl) { throw new Error('WEBHOOK_URL env var is required for webhook notifications');}await fetch(webhookUrl, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ status: 'completed', usage: final.usage, text: await result.getText(), }),});
A correct long-horizon implementation should pass all of the following:
A run with a low maxCost (for example, maxCost(0.10)) returns from
callModel once the ceiling is hit, even if the agent has more work
queued.
Killing the process mid-run and starting a new callModel invocation with
the same StateAccessor resumes from the saved ConversationState. The
message history grows rather than starting over.
getToolCallsStream() and getTextStream() yield events while the agent
is still running, not only at the end.
Sending a voice file through sdk.stt.createTranscription returns the
expected text, and feeding that text into callModel produces a response
that references the spoken request.
A webhook (or other notification) fires after getResponse() resolves.