Service Tiers
Service Tiers
The service_tier parameter lets you control cost and latency tradeoffs when sending requests through OpenRouter. You can pass it in your request to select a specific processing tier, and the response will indicate which tier was actually used.
Not every model from a provider supports service tiers. Additionally, your requested service tier is not guaranteed to be honored — the provider may serve your request on a different tier depending on availability. The service_tier field in the response indicates which tier was actually used, and you will be billed according to that actual tier.
Supported Providers
OpenAI
- Accepted request values:
auto,default,flex,priority(default if omitted:auto) - Possible response values:
default,flex,priority
Learn more in OpenAI’s Chat Completions and Responses API documentation. See OpenAI’s pricing page for details on cost differences between tiers.
Google (Vertex AI)
- Accepted request values:
standard,flex,priority(default if omitted:standard) - Possible response values:
standard,flex,priority
Learn more in Google’s Flex and Priority documentation.
API Response Differences
The API response includes a service_tier field that indicates which capacity tier was actually used to serve your request. The placement of this field varies by API format:
- Chat Completions API (
/api/v1/chat/completions):service_tieris returned at the top level of the response object, matching OpenAI’s native format. - Responses API (
/api/v1/responses):service_tieris returned at the top level of the response object, matching OpenAI’s native format. - Messages API (
/api/v1/messages):service_tieris returned inside theusageobject, matching Anthropic’s native format.