Guide a Video with Reference Images
Use this guide when you need to add reference-to-video generation where images influence the output without forcing exact first or last frames.
By the end, your implementation should submit a reference-to-video job with
input_references.
For reusable agent knowledge across projects, install the openrouter-video skill.
Before you start
You need:
- An OpenRouter API key available as
OPENROUTER_API_KEY - Node.js 20 or newer
- One or more public HTTPS image URLs, starting with
REFERENCE_IMAGE_URL - A model that supports reference-to-video, confirmed from the current OpenRouter video docs or model description
If you have not chosen a model yet, read Choose a Video Generation Model so you can select one based on your clip duration, output shape, input type, audio, provider controls, and cost requirements.
Use the API reference pages as the source of truth for exact fields:
- Create video generation request
- List video generation models
- TypeScript SDK video generation reference
Use input_references for visual guidance. Use frame_images only when you need exact frame control.
Use stable, directly downloadable image URLs. Some providers cannot fetch image URLs that require cookies, redirects through HTML pages, bot checks, or unusual headers.
Submitting POST /api/v1/videos starts a real video generation job and may
spend OpenRouter credits.
The video models endpoint does not expose a dedicated structured reference-image field for every provider. Confirm reference support from the model description or current docs before you submit:
Example model output excerpt:
For bytedance/seedance-2.0-fast, the model list can confirm the example
duration, resolution, and aspect_ratio; reference-image support may still
need confirmation from the model description or docs.
Step 1: Write a prompt that tells the model how to use the references
Reference images work best when the prompt explains what should stay consistent.
Step 2: Submit the reference-to-video job
Build the video request with input_references when the images should guide
subject, identity, or style. Unlike frame_images, reference images are not
exact frame anchors.
Example request shape:
The submit call returns the job fields immediately. In the QA run, the submitted job later completed and downloaded with this final summary:
Step 3: Add more references when consistency matters
Some models can use multiple reference images. Before doing this in production, check the current docs or model description for the selected model, then start with the smallest number of references that gives you enough consistency.
Then set input_references in the request body to inputReferences.
Request shape for the optional multi-reference path:
Step 4: Poll and download
After submission, poll from a server route, worker, or job runner instead of the browser. Keep the flow explicit: poll with a limit, stop on terminal failure, then download the completed video.
Example polling and download helper:
The QA run saved the finished video after polling completed:
Check your work
The output should borrow subject, style, or identity cues from the reference images while still following the generated scene described in the prompt. The implementation should produce a playable MP4 from the completed job.