stream parameter to true in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.
Here is an example of how to stream a response, and process it:
Additional Information
For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:X-Generation-Id response header for all endpoints (chat completions, completions, responses, and messages), which can be useful for debugging and correlating requests.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:
Stream Cancellation
Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.Provider Support
Provider Support
Supported
- OpenAI, Azure, Anthropic
- Fireworks, Mancer, Recursal
- AnyScale, Lepton, OctoAI
- Novita, DeepInfra, Together
- Cohere, Hyperbolic, Infermatic
- Avian, XAI, Cloudflare
- SFCompute, Nineteen, Liquid
- Friendli, Chutes, DeepSeek
- AWS Bedrock, Groq, Modal
- Google, Google AI Studio, Minimax
- HuggingFace, Replicate, Perplexity
- Mistral, AI21, Featherless
- Lynn, Lambda, Reflection
- SambaNova, Inflection, ZeroOneAI
- AionLabs, Alibaba, Nebius
- Kluster, Targon, InferenceNet
Handling Errors During Streaming
OpenRouter handles errors differently depending on when they occur during the streaming process:Errors Before Any Tokens Are Sent
If an error occurs before any tokens have been streamed to the client, OpenRouter returns a standard JSON error response with the appropriate HTTP status code. This follows the standard error format:- 400: Bad Request (invalid parameters)
- 401: Unauthorized (invalid API key)
- 402: Payment Required (insufficient credits)
- 429: Too Many Requests (rate limited)
- 502: Bad Gateway (provider error)
- 503: Service Unavailable (no available providers)
Errors After Tokens Have Been Sent (Mid-Stream)
If an error occurs after some tokens have already been streamed to the client, OpenRouter cannot change the HTTP status code (which is already 200 OK). Instead, the error is sent as a Server-Sent Event (SSE) with a unified structure:- The error appears at the top level alongside standard response fields (id, object, created, etc.)
- A
choicesarray is included withfinish_reason: "error"to properly terminate the stream - The HTTP status remains 200 OK since headers were already sent
- The stream is terminated after this unified error event
Code Examples
Here’s how to properly handle both types of errors in your streaming implementation:API-Specific Behavior
Different API endpoints may handle streaming errors slightly differently:- OpenAI Chat Completions API: Returns
ErrorResponsedirectly if no chunks were processed, or includes error information in the response if some chunks were processed - OpenAI Responses API: May transform certain error codes (like
context_length_exceeded) into a successful response withfinish_reason: "length"instead of treating them as errors