Introducing the VEED Subtitle API
by
Georgie Kemp

Introducing the VEED Subtitle API

AI
Video Marketing

The VEED Subtitle API transforms raw footage into polished, publish-ready content with professional burned-in subtitles — all in a single API call. Submit a video URL, choose a preset, and receive a styled MP4. The full pipeline: transcription, styling, and rendering. Starting at $0.10/min via fal.ai.

Quick facts

  • What it does: transcription + visual styling + burned-in render in one API call
  • Input: MP4, MOV, WebM, M4V, or GIF via URL; optional SRT to skip transcription
  • Output: MP4 with subtitles burned in
  • Languages: 100+ with 98%+ auto-caption accuracy
  • Presets: 7 Dynamic (glass, whisper, glide, glide2, fusion, terminal, handwritten) + 21 Basic
  • Pricing: from $0.10/min — 2x multiplier for >1080p resolution, 2x for Dynamic presets
  • Max duration: 2 hours at ≤1080p · 1 hour above 1080p
  • Available now: fal.ai/models/veed/subtitles

What is the VEED Subtitle API?

The VEED Subtitle API is an end-to-end subtitle generation endpoint built for automated video production pipelines. You send a video URL. The API transcribes the audio using VEED's transcription pipeline, auto-detects the language, applies one of 28 style presets, and returns a finished MP4 with subtitles burned in via a dedicated C++ render-node.

This is different from a pure speech-to-text API. Services like Deepgram, AssemblyAI, and Whisper return a transcript — but getting from a transcript to a styled, burned-in video requires five more layers: timestamp alignment, line-break logic, visual styling, frame rendering, and output formatting. Our Subtitle API handles all five in a single call.

It is available via fal.ai, priced per minute of input video. No seats. No subscription tiers. From $0.10/min.

What the Subtitle API includes

Feature What it means for your pipeline
Full pipeline in one call The API handles transcription, styling, and render via a dedicated C++ render-node — no extra layers to build or maintain
28 style presets 7 Dynamic presets (glass, whisper, glide, glide2, fusion, terminal, handwritten) and 21 Basic presets — the same ones powering millions of VEED videos, battle-tested for readability on mute
100+ languages, 98%+ accuracy Auto-detects the input language from the audio, or pass a language code manually for higher accuracy on specialist terminology
Optional SRT input Pass an SRT file to skip transcription entirely and go straight to styling and render — useful for reviewed transcripts, translated captions, or precise timing requirements
Safe zone rendering Subtitles are automatically placed in safe zones for every aspect ratio (9:16, 1:1, 16:9) so they are never hidden behind platform UI on TikTok, Reels, YouTube, or elsewhere
Basic and Dynamic preset tiers Basic presets (1x pricing multiplier) for standard styling; Dynamic (2x) for high-fidelity animations — choose based on use case and cost requirements
Resolution-aware pricing Standard (≤1080p) billed at 1x; Pro (>1080p) at 2x. Multipliers compound: a 4K Dynamic render is $0.10 × 2.0 × 2.0 = $0.40/min
Per-minute billing via fal.ai From $0.10/min based on input video duration. No seats, no subscription tiers, no minimum commitments

How it works

The Subtitle API follows a three-step flow:

Step 01

Submit your video

Pass a video URL (MP4, MOV, WebM, M4V, or GIF) via the video_url parameter. Use FAL storage, a presigned URL, or any publicly accessible link. Optionally pass an SRT via srt_url to skip transcription.

Step 02

Choose a preset

Pass a preset name via the preset parameter. Dynamic presets (glass, whisper, glide, glide2, fusion, terminal, handwritten) produce high-fidelity animations. Basic presets produce standard styling at half the cost.

Step 03

Poll for your output

A request_id is returned immediately. Poll the fal.ai status endpoint until the job completes. The final output is an MP4 with subtitles burned in via VEED's C++ render-node, ready to publish.

A minimal request looks like this:

POST https://fal.run/veed/subtitles

{
  "video_url": "https://your-bucket.s3.amazonaws.com/clip.mp4",
  "preset":   "glass",   // Dynamic preset — see full list below
  "language": null     // null = auto-detect; or pass e.g. "en", "es", "fr"
}

// Response: request_id returned immediately
// Poll status endpoint until complete
// Final output: { "video": { "url": "...output.mp4", "content_type": "video/mp4" } }
  • Dynamic Presets: glass, whisper, glide, glide2, fusion, terminal, handwritten
  • Basic Presents: simple, plain, beans, corpo, boo, shadeplay, casper, capri, lowkey, vinta, diego, ali, slay, kitty, hustle, karl, sprout, flex, mint, rizz, vegas

What you can build with it

The Subtitle API is designed for high-volume, automated video pipelines. Common use cases include:

  • Social content pipelines — auto-caption TikToks, Reels, and Shorts at volume using Dynamic presets like glass or whisper. Submit batches in parallel; safe zone rendering ensures subtitles are never cut off by platform UI.
  • YouTube upload automation — push recordings through the API the moment they wrap. With 98%+ transcription accuracy across 100+ languages and a 2-hour max duration for ≤1080p content, the pipeline handles long-form as easily as short-form.
  • Ad creative at scale — caption UGC and talking-head clips across an entire creative backlog. Use Basic presets at $0.10/min for standard output, or Dynamic presets for premium ad units that need high-fidelity animations.
  • Multilingual content libraries — bring a translated SRT per language, apply VEED's presets, and export styled captions for every market from a single pipeline. Pass a language code manually for higher accuracy on specialist terminology.
  • Accessibility-first publishing — every video published with burned-in captions, automatically. Safe zone rendering ensures captions are readable on every screen size and never obscured by platform UI.

How the VEED Subtitle API compares

There are two categories of API used for subtitle workflows: pure speech-to-text (STT) APIs, and subtitle APIs that return styled video. VEED sits in the second category and goes further on styling quality, customisation, and pipeline completeness.

Capability STT APIs
Deepgram, AssemblyAI, Whisper
Other subtitle APIs
ZapCap, Submagic, Captions
VEED Subtitle API
Transcription✓ Yes✓ Yes✓ Yes, 98%+ accuracy
Language supportVariesVaries100+ languages
Timestamp alignmentBuild it yourself✓ Yes✓ Yes
Visual style presetsBasic28 presets (Basic + Dynamic)
High-fidelity animationsSome7 Dynamic presets
Burned-in video renderBuild it yourself✓ YesC++ render-node
Safe zone renderingRarely✓ All aspect ratios
SRT input (skip transcription)Rarely✓ Yes
Parallel async jobs✓ YesSome✓ Yes
Output formatText / SRTMP4Styled MP4
Starting price~$0.006/min~$0.10/min$0.10/min

Source: VEED internal pricing benchmark + official provider pricing pages, April 2026. Verify all pricing before publishing.

When not to use the Subtitle API

The VEED Subtitle API outputs a rendered MP4. It is not the right tool if you need:

This API is not a fit for:

  • Raw transcript or SRT output only — the API outputs a rendered MP4, not text. Use Deepgram, AssemblyAI, or Whisper if you only need a transcript.
  • Real-time or live captioning — the async job model is not designed for sub-second latency. A real-time option is not currently available.
  • Subtitle translation — the API transcribes and styles; it does not translate. Bring a pre-translated SRT if you need captions in a different language from the source audio.
  • Webhook-based job completion — the current implementation uses status polling, not push notifications.
  • Videos exceeding duration limits — maximum input is 2 hours at ≤1080p and 1 hour above 1080p.

The first API to go from raw video to publish-ready captions in one call

Every other subtitle API stops at the transcript. VEED is the first to handle transcription, styling, and render in a single call — returning a post-ready MP4 built on the same presets and C++ render-node powering millions of videos in the VEED editor.

It's also the first in a growing suite of VEED video APIs on fal.ai, alongside Fabric 1.0, the Lip Sync API, and Background Removal — all heading in the same direction: professional video production as composable API endpoints.

Ready to try subtitles features?

Faq

What is the VEED Subtitle API?

The VEED Subtitle API is an end-to-end subtitle generation endpoint. It accepts a video URL, auto-transcribes the audio, applies a visual style preset, and returns a finished MP4 with subtitles burned in. It handles the full pipeline — transcription, styling, and rendering — in a single call, and is available via fal.ai with per-minute pricing.

How does the VEED Subtitle API work?

Submit a video URL (MP4, MOV, or WebM) via POST to the fal.ai endpoint. The API auto-transcribes the audio, detects the language, applies your chosen style preset, and returns a job ID. Poll the status endpoint until the job completes. The final output is a styled MP4 with subtitles burned in, ready to publish.

How much does the VEED Subtitle API cost?

Pricing is per minute of video processed, billed via fal.ai. There are no seats, subscription tiers, or minimum commitments. You pay for exactly what you process, which keeps costs predictable at any volume.

Does it work or YouTube, TikTok & Instagram?

Yes. VEED's style presets are tuned for readability across vertical (9:16), square (1:1), and horizontal (16:9) aspect ratios, with font sizes, line spacing, and contrast levels optimised for social-first viewing on mute. The API processes any aspect ratio without extra configuration.

What video formats does the Subtitle API accept?

The API accepts MP4, MOV, and WebM. Submit via a public URL, FAL storage URL, or presigned link from your own storage bucket. Output is always a styled MP4 with subtitles burned in.

When it comes to  amazing videos, all you need is VEED

Create your first video
No credit card required