Skip to main content
POST
/
chat
/
completions
Create a chat completion
curl --request POST \
  --url https://openrouter.ai/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "max_tokens": 150,
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "What is the capital of France?",
      "role": "user"
    }
  ],
  "model": "openai/gpt-4",
  "temperature": 0.7
}
'
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The capital of France is Paris.",
        "role": "assistant"
      }
    }
  ],
  "created": 1677652288,
  "id": "chatcmpl-123",
  "model": "openai/gpt-4",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 10,
    "prompt_tokens": 25,
    "total_tokens": 35
  }
}

Authorizations

Authorization
string
header
required

API key as bearer token in Authorization header

Headers

X-OpenRouter-Metadata
enum<string>

Opt-in to surface routing metadata on the response under openrouter_metadata. Defaults to disabled. The legacy header X-OpenRouter-Experimental-Metadata is also accepted for backward compatibility. Opt-in level for surfacing routing metadata on the response under openrouter_metadata.

Available options:
disabled,
enabled
Example:

"enabled"

Body

application/json

Chat completion request parameters

messages
object[]
required

List of messages for the conversation

Minimum array length: 1

Chat completion message with role-based discrimination

Example:
{
"content": "What is the capital of France?",
"name": "Assistant Config",
"role": "user"
}
Example:
[{ "content": "Hello!", "role": "user" }]
cache_control
object

Enable automatic prompt caching. When set at the top level, the system automatically applies cache breakpoints to the last cacheable block in the request. Currently supported for Anthropic Claude models.

Example:
{ "type": "ephemeral" }
debug
object

Debug options for inspecting request transformations (streaming only)

Example:
{ "echo_upstream_body": true }
frequency_penalty
number<double> | null

Frequency penalty (-2.0 to 2.0)

Example:

0

image_config
object

Provider-specific image configuration options. Keys and values vary by model/provider. See https://openrouter.ai/docs/guides/overview/multimodal/image-generation for more details.

Example:
{ "aspect_ratio": "16:9", "quality": "high" }
logit_bias
object | null

Token logit bias adjustments

Example:
{ "50256": -100 }
logprobs
boolean | null

Return log probabilities

Example:

false

max_completion_tokens
integer | null

Maximum tokens in completion

Example:

100

max_tokens
integer | null

Maximum tokens (deprecated, use max_completion_tokens). Note: some providers enforce a minimum of 16.

Example:

100

metadata
object

Key-value pairs for additional object information (max 16 pairs, 64 char keys, 512 char values)

Example:
{
"session_id": "session-456",
"user_id": "user-123"
}
min_p
number<double> | null

Minimum probability threshold relative to the most likely token. Tokens with probability below min_p * (probability of top token) are filtered out. Not all providers support this parameter.

Example:

0.1

modalities
enum<string>[]

Output modalities for the response. Supported values are "text", "image", and "audio".

Available options:
text,
image,
audio
Example:
["text", "image"]
model
string

Model to use for completion

Example:

"openai/gpt-4"

models
string[]

Models to use for completion

Available OpenRouter chat completion models

Example:
["openai/gpt-4", "openai/gpt-4o"]
parallel_tool_calls
boolean | null

Whether to enable parallel function calling during tool use. When true, the model may generate multiple tool calls in a single response.

Example:

true

plugins
object[]

Plugins you want to enable for this request, including their settings.

Example:
{
"allowed_models": ["anthropic/*", "openai/gpt-4o"],
"cost_quality_tradeoff": 7,
"enabled": true,
"id": "auto-router"
}
presence_penalty
number<double> | null

Presence penalty (-2.0 to 2.0)

Example:

0

provider
object | null

When multiple model providers are available, optionally indicate your routing preference.

Example:
{ "allow_fallbacks": true }
reasoning
object

Configuration options for reasoning models

Example:
{ "effort": "medium", "summary": "concise" }
reasoning_effort
enum<string> | null

Shorthand for setting reasoning effort. Equivalent to setting reasoning.effort. Cannot be used simultaneously with reasoning.effort if they differ.

Available options:
max,
xhigh,
high,
medium,
low,
minimal,
none,
null
Example:

"medium"

repetition_penalty
number<double> | null

Penalizes tokens based on how much they have already appeared in the text. A value of 1.0 means no penalty. Values above 1.0 penalize repeated tokens more strongly. Not all providers support this parameter.

Example:

1

response_format
object

Response format configuration

Example:
{ "type": "json_object" }
route
enum<string> | null
deprecated

DEPRECATED Use providers.sort.partition instead. Backwards-compatible alias for providers.sort.partition. Accepts legacy values: "fallback" (maps to "model"), "sort" (maps to "none").

Available options:
fallback,
sort,
null
Example:

"fallback"

seed
integer | null

Random seed for deterministic outputs

Example:

42

service_tier
enum<string> | null

The service tier to use for processing this request.

Available options:
auto,
default,
flex,
priority,
scale,
null
Example:

"auto"

session_id
string

A unique identifier for grouping related requests (e.g., a conversation or agent workflow). When provided, OpenRouter uses it as the sticky routing key, routing all requests in the session to the same provider to maximize prompt cache hits. Also used for observability grouping. If provided in both the request body and the x-session-id header, the body value takes precedence. Maximum of 256 characters.

Maximum string length: 256
stop

Stop sequences (up to 4)

Example:
["\n"]
stop_server_tools_when
object[]

Stop conditions for the server-tool agent loop. Any condition firing halts the loop (OR logic). When set, this overrides max_tool_calls.

Minimum array length: 1

A single condition that, when met, halts the server-tool agent loop.

Example:
{ "step_count": 5, "type": "step_count_is" }
Example:
[
{ "step_count": 5, "type": "step_count_is" },
{
"max_cost_in_dollars": 0.5,
"type": "max_cost"
}
]
stream
boolean
default:false

Enable streaming response

Example:

false

stream_options
object | null

Streaming configuration options

Example:
{ "include_usage": true }
temperature
number<double> | null

Sampling temperature (0-2)

Example:

0.7

tool_choice

Tool choice configuration

Available options:
none
Example:

"auto"

tools
object[]

Available tools for function calling

Tool definition for function calling (regular function or OpenRouter built-in server tool)

Example:
{
"function": {
"description": "Get the current weather for a location",
"name": "get_weather",
"parameters": {
"properties": {
"location": {
"description": "City name",
"type": "string"
},
"unit": {
"enum": ["celsius", "fahrenheit"],
"type": "string"
}
},
"required": ["location"],
"type": "object"
}
},
"type": "function"
}
Example:
[
{
"function": {
"description": "Get weather",
"name": "get_weather"
},
"type": "function"
}
]
top_a
number<double> | null

Consider only tokens with "sufficiently high" probabilities based on the probability of the most likely token. Not all providers support this parameter.

Example:

0

top_k
integer | null

Limits the model to choose from the top K most likely tokens at each step. A value of 1 means the model will always pick the most likely next token. Not all providers support this parameter.

Example:

40

top_logprobs
integer | null

Number of top log probabilities to return (0-20)

Example:

5

top_p
number<double> | null

Nucleus sampling parameter (0-1)

Example:

1

trace
object

Metadata for observability and tracing. Known keys (trace_id, trace_name, span_name, generation_name, parent_span_id) have special handling. Additional keys are passed through as custom metadata to configured broadcast destinations.

Example:
{
"trace_id": "trace-abc123",
"trace_name": "my-app-trace"
}
user
string

Unique user identifier

Example:

"user-123"

Response

Successful chat completion response

Chat completion response

choices
object[]
required

List of completion choices

created
integer
required

Unix timestamp of creation

Example:

1677652288

id
string
required

Unique completion identifier

Example:

"chatcmpl-123"

model
string
required

Model used for completion

Example:

"openai/gpt-4"

object
enum<string>
required
Available options:
chat.completion
system_fingerprint
string | null
required

System fingerprint

Example:

"fp_44709d6fcb"

openrouter_metadata
object
Example:
{
"attempt": 1,
"endpoints": {
"available": [
{
"model": "openai/gpt-4o",
"provider": "OpenAI",
"selected": true
}
],
"total": 1
},
"is_byok": false,
"region": "iad",
"requested": "openai/gpt-4o",
"strategy": "direct",
"summary": "available=1, selected=OpenAI"
}
service_tier
string | null

The service tier used by the upstream provider for this request

Example:

"default"

usage
object

Token usage statistics

Example:
{
"completion_tokens": 15,
"completion_tokens_details": { "reasoning_tokens": 5 },
"cost": 0.0012,
"cost_details": {
"upstream_inference_completions_cost": 0.0004,
"upstream_inference_cost": null,
"upstream_inference_prompt_cost": 0.0008
},
"is_byok": false,
"prompt_tokens": 10,
"prompt_tokens_details": { "cached_tokens": 2 },
"server_tool_use_details": {
"tool_calls_executed": 2,
"tool_calls_requested": 2
},
"total_tokens": 25
}