The VEED Subtitle API transforms raw footage into polished, publish-ready content with professional burned-in subtitles — all in a single API call. Submit a video URL, choose a preset, and receive a styled MP4. The full pipeline: transcription, styling, and rendering. Starting at $0.10/min via fal.ai.
Quick facts
- What it does: transcription + visual styling + burned-in render in one API call
- Input: MP4, MOV, WebM, M4V, or GIF via URL; optional SRT to skip transcription
- Output: MP4 with subtitles burned in
- Languages: 100+ with 98%+ auto-caption accuracy
- Presets: 7 Dynamic (glass, whisper, glide, glide2, fusion, terminal, handwritten) + 21 Basic
- Pricing: from $0.10/min — 2x multiplier for >1080p resolution, 2x for Dynamic presets
- Max duration: 2 hours at ≤1080p · 1 hour above 1080p
- Available now: fal.ai/models/veed/subtitles
What is the VEED Subtitle API?
The VEED Subtitle API is an end-to-end subtitle generation endpoint built for automated video production pipelines. You send a video URL. The API transcribes the audio using VEED's transcription pipeline, auto-detects the language, applies one of 28 style presets, and returns a finished MP4 with subtitles burned in via a dedicated C++ render-node.
This is different from a pure speech-to-text API. Services like Deepgram, AssemblyAI, and Whisper return a transcript — but getting from a transcript to a styled, burned-in video requires five more layers: timestamp alignment, line-break logic, visual styling, frame rendering, and output formatting. Our Subtitle API handles all five in a single call.
It is available via fal.ai, priced per minute of input video. No seats. No subscription tiers. From $0.10/min.
What the Subtitle API includes
How it works
The Subtitle API follows a three-step flow:
A minimal request looks like this:
POST https://fal.run/veed/subtitles
{
"video_url": "https://your-bucket.s3.amazonaws.com/clip.mp4",
"preset": "glass", // Dynamic preset — see full list below
"language": null // null = auto-detect; or pass e.g. "en", "es", "fr"
}
// Response: request_id returned immediately
// Poll status endpoint until complete
// Final output: { "video": { "url": "...output.mp4", "content_type": "video/mp4" } }- Dynamic Presets: glass, whisper, glide, glide2, fusion, terminal, handwritten
- Basic Presents: simple, plain, beans, corpo, boo, shadeplay, casper, capri, lowkey, vinta, diego, ali, slay, kitty, hustle, karl, sprout, flex, mint, rizz, vegas
What you can build with it
The Subtitle API is designed for high-volume, automated video pipelines. Common use cases include:
- Social content pipelines — auto-caption TikToks, Reels, and Shorts at volume using Dynamic presets like glass or whisper. Submit batches in parallel; safe zone rendering ensures subtitles are never cut off by platform UI.
- YouTube upload automation — push recordings through the API the moment they wrap. With 98%+ transcription accuracy across 100+ languages and a 2-hour max duration for ≤1080p content, the pipeline handles long-form as easily as short-form.
- Ad creative at scale — caption UGC and talking-head clips across an entire creative backlog. Use Basic presets at $0.10/min for standard output, or Dynamic presets for premium ad units that need high-fidelity animations.
- Multilingual content libraries — bring a translated SRT per language, apply VEED's presets, and export styled captions for every market from a single pipeline. Pass a language code manually for higher accuracy on specialist terminology.
- Accessibility-first publishing — every video published with burned-in captions, automatically. Safe zone rendering ensures captions are readable on every screen size and never obscured by platform UI.
How the VEED Subtitle API compares
There are two categories of API used for subtitle workflows: pure speech-to-text (STT) APIs, and subtitle APIs that return styled video. VEED sits in the second category and goes further on styling quality, customisation, and pipeline completeness.
When not to use the Subtitle API
The VEED Subtitle API outputs a rendered MP4. It is not the right tool if you need:
This API is not a fit for:
- Raw transcript or SRT output only — the API outputs a rendered MP4, not text. Use Deepgram, AssemblyAI, or Whisper if you only need a transcript.
- Real-time or live captioning — the async job model is not designed for sub-second latency. A real-time option is not currently available.
- Subtitle translation — the API transcribes and styles; it does not translate. Bring a pre-translated SRT if you need captions in a different language from the source audio.
- Webhook-based job completion — the current implementation uses status polling, not push notifications.
- Videos exceeding duration limits — maximum input is 2 hours at ≤1080p and 1 hour above 1080p.
The first API to go from raw video to publish-ready captions in one call
Every other subtitle API stops at the transcript. VEED is the first to handle transcription, styling, and render in a single call — returning a post-ready MP4 built on the same presets and C++ render-node powering millions of videos in the VEED editor.
It's also the first in a growing suite of VEED video APIs on fal.ai, alongside Fabric 1.0, the Lip Sync API, and Background Removal — all heading in the same direction: professional video production as composable API endpoints.
.png)

.png)
