VTT Generator

Click or drag your audio/video file here

  • MP3 · WAV · M4A · MP4 — hasta 1 GB
  • Más de 60 idiomas
  • ~3 min por hora de audio
  • 6 formatos de exportación

Más de 100.000 podcasters y creadores confían en nosotros Privado y seguro: tus archivos son tuyos

Drop Audio or Video ➞ Generate the VTT File

Generate WebVTT caption files from any audio or video. Accurate, automatically timed cues for HTML5 video, web players, and accessibility compliance.

Castmagic

Captions in the Format the Web Actually Speaks

WebVTT is the caption format of the open web: it's what the HTML5 <track> element expects, what modern web video players consume natively, and what most embedded-video platforms ask for when you add captions. If your video lives on a website — a course platform, a product page, a help center — VTT is the file you need.

Castmagic generates it straight from the media. Upload audio or video, transcription runs with word-level timing, and the exported VTT carries cues that match the speech exactly — readable lengths, accurate boundaries, standard syntax. Edit the transcript first and the captions inherit every fix.

Where VTT files go to work

Self-hosted HTML5 video with a <track> tag. Course videos on learning platforms. Product demos and onboarding videos in web apps. Help-center walkthroughs. Video players like Video.js, Plyr, and JW Player. Anywhere video is embedded in a page, VTT is how the captions ride along.

Captions are an accessibility requirement, not a nice-to-have

Accessibility standards for web content expect spoken material to have text alternatives, and captions are the baseline for video. A generated VTT gets embedded video from non-compliant to captioned in minutes — with accuracy you've verified in the editor, not auto-captions you can't touch.

VTT details done right

WebVTT times cues with a dot for milliseconds (HH:MM:SS.mmm) where SRT uses a comma — small differences like this are why hand-converting between formats goes wrong. Castmagic exports clean, spec-correct VTT, and the SRT version is in the same menu if a different tool in your chain wants SubRip instead.

The transcript does double duty

The same transcript behind the VTT exports as text, PDF, Word, and CSV — and powers AI-generated summaries and descriptions for the page the video lives on. Caption the video and write its supporting copy from one upload.

World Class VTT Generator

We Power The Best Creators

How To Generate a VTT File

Microphone icon

Upload your audio or video file — or paste a link

Drag your audio or video file into the uploader above, or paste a link if it lives online (YouTube, a podcast feed, cloud storage). Common audio and video formats are all supported.

Play icon

Castmagic transcribes it

Transcription starts immediately — 60+ languages with auto-detect, speaker labels, and word-level timestamps. An hour of audio typically processes in 3-5 minutes.

Fast-forward icon

Review and polish the transcript

Open the transcript in the editor: rename speakers, fix any terms, and add custom spellings so brand names and jargon come out right on every future upload.

No es una herramienta de transcripción más

Dimension Herramienta de transcripción típica Castmagic
Lo que recibes Un archivo de texto Una transcripción con hablantes identificados y marcas de tiempo — más resúmenes, notas de episodio y publicaciones redactadas por IA desde la misma subida
Idiomas y traducción Solo transcripción, a menudo centrada en inglés Más de 60 idiomas de transcripción; traduce cualquier transcripción a 11 idiomas conservando marcas de tiempo y hablantes
Formatos de exportación TXT, quizá SRT TXT, SRT, VTT, PDF, DOCX y CSV — todos los formatos, todos los idiomas, un solo menú
Después de la transcripción Estás por tu cuenta Haz preguntas a Magic Chat sobre la grabación, busca en toda tu biblioteca y genera contenido con plantillas de IA

Download your VTT

Export a WebVTT caption file with exact cue timing, ready for HTML5 video and modern players. The other formats — TXT, SRT, VTT, PDF, DOCX, and CSV — are one click away in the same menu.

VTT Generator & Content
Download your VTT

Generate content from the transcript

The transcript doubles as a content source: Castmagic's AI presets draft summaries, show notes, blog posts, social clips, and follow-up emails from the same audio or video file.

Clips & VTT Generator
Generate content from the transcript

Endless Content Assets In Seconds

Automate all the tedious work that comes in editing and copywriting and say hello to your new best content editor.

Integrate Content From All Your Favorite Platforms

RSS RSS
Zoom Zoom
Google Drive Google Drive
Wistia Wistia
Descript Descript
YouTube YouTube
Vimeo Vimeo
TikTok TikTok
Instagram Instagram
Twitch Twitch
Loom Loom
Zapier Zapier

Professional Creators Love Castmagic

Castmagic is just a great product. When it came to creating content around The Calum Johnson Show it made our life a lot easier. Highly recommend
Calum Johnson
Calum Johnson YouTuber

Frequently Asked Questions

Last updated June 2026 by the Castmagic team

How do I convert an audio or video file to VTT?

Upload the audio or video file to Castmagic (or paste a link to it), wait a few minutes for transcription, then choose VTT from the download menu. You'll get a WebVTT caption file with exact cue timing, ready for HTML5 video and modern players.

How accurate is the transcription?

Castmagic uses state-of-the-art speech models with support for 60+ languages, automatic language detection, and speaker labeling. Clear single-speaker audio typically transcribes well above 95% accuracy, and a custom-vocabulary list keeps brand names, product names, and industry jargon spelled correctly.

What formats can I download besides VTT?

Every transcript exports to six formats from the same menu: plain text (TXT), SubRip subtitles (SRT), WebVTT captions (VTT), a formatted PDF document, an editable Word document (DOCX), and a structured spreadsheet (CSV) with per-utterance speakers and timings.

Is this free to use?

Castmagic offers a free tier so you can convert a audio file and try the full workflow. Volume use — multiple files per week, longer recordings, and AI-generated content output — is available on paid plans.

What's the difference between VTT and SRT?

WebVTT is the web-native caption format (HTML5 <track>, modern web players) and uses dot-millisecond timing; SRT is the older, universal editor/player format with comma timing. Castmagic exports both from the same transcript — use VTT for web embeds, SRT for editors and YouTube.

Can I use the generated VTT with my website's video player?

Yes — the export is standard WebVTT, which HTML5 <track> elements and players like Video.js, Plyr, and JW Player consume natively.

Can I generate translated captions?

Yes — translate the transcript into any of ten languages (Spanish, French, German, Japanese, and more) and export the translated VTT with identical cue timing.