Clean rollup or scrolling captions from .vtt files exported by YouTube, yt-dlp, Otter, Zoom, Teams, or Google Meet. 100% in your browser — works fully offline.
Clean Rollup Captions From VTT Files Online Free
Remove duplicated, accumulating subtitle text from YouTube auto-generated and live-caption VTT files.
What the Subtitle Rollup Cleaner Does
The Subtitle Rollup Cleaner takes a WebVTT file containing rollup or scrolling captions — the style produced by YouTube auto-captions, yt-dlp, Zoom, Microsoft Teams, Google Meet, Otter, and other live speech recognizers— and collapses the cumulative duplication into clean, deduplicated cues. The output is a normal WebVTT file with one sentence per cue, accurate timestamps inherited from the source, and no carry-over text. Everything runs 100% in your browser. Files never leave your device.
If you have ever opened a YouTube auto-caption file and found the same sentence repeated thirty or forty times in slowly-growing fragments, you have a rollup file. This tool fixes it.
What rollup or scrolling captions look like
In a rollup file, each cue contains the full sentence built up so far, with one or two new words added each time. The same sentence appears across dozens of cues, each slightly longer than the last. A short example fragment from a real-world YouTube file:
00:00:34.040 --> 00:00:34.320 Hello, 00:00:34.320 --> 00:00:34.440 Hello, you 00:00:34.440 --> 00:00:34.600 Hello, you are 00:00:34.600 --> 00:00:34.720 Hello, you are very 00:00:34.720 --> 00:00:34.920 Hello, you are very welcome
Played in a video player, these snapshots create the smooth scrolling effect you see beneath a YouTube live stream. As a file, however, they are unreadable, untranslatable, and useless as a transcript.
How the cleaner detects and removes rollup
The cleaner walks through the file looking for cumulative prefix chains: cues where the previous cue’s full text is a prefix of the current cue’s text. When a chain breaks, the final most-complete sentence in that chain is kept and the rest are discarded. The kept cue’s timestamps inherit the start of the first snapshot and the end of the last, preserving accurate sync with the original audio. Two safeguards prevent false positives: cues only chain if they touch in time (under two seconds apart), and only true cumulative growth counts — legitimate spaced repetition in normal dialogue is preserved.
When You Need a Rollup Cleaner
Cleaning YouTube auto-generated VTT for transcripts
YouTube’s auto-caption track downloads as a VTT file in rollup format, which is unsuitable for use as a transcript, a study aid, or input to a translation pipeline. Running it through the cleaner produces a one-sentence-per-cue file that reads naturally and can be processed further.
Cleaning yt-dlp --write-auto-sub exports
Command-line tools like yt-dlp fetch YouTube’s auto-caption track using the --write-auto-sub flag, but the resulting VTT carries the same rollup duplication as the YouTube Studio download. The cleaner accepts these files directly.
Cleaning Zoom, Teams, and Google Meet live caption exports
Meeting platforms that export live captions as VTT use the same rollup pattern, since live transcription works identically across vendors. If you are saving meeting captions to feed into note-taking or compliance workflows, clean them first to get sentences instead of fragments.
Cleaning Otter and Web Speech API recorder output
Browser-based transcription tools built on the Web Speech API produce rollup-style VTT for the same reason. Whatever the source, if your file has growing-prefix duplication, the cleaner will collapse it.
How to Clean a Rollup VTT File (Step by Step)
1. Download or export your VTT file
From YouTube Studio: open the video, click Subtitles, choose the auto-generated track, and select Download as VTT. From yt-dlp: yt-dlp --skip-download --write-auto-sub --sub-format vtt URL. From Zoom or Teams: enable closed captions during the meeting and download the captions track afterwards.
2. Upload it to the cleaner
Click Choose File above and select your .vtt file. Nothing is uploaded to a server — the file is read into your browser’s memory only.
3. Click Clean Rollup, then download
Click Clean Rollup. The cleaner processes the file in a fraction of a second and shows you how many cues were collapsed. Click the download link to save the cleaned .vtt to your device. Your original file is not modified.
Key Features of the Subtitle Rollup Cleaner
Accurate cumulative-prefix detection
The cleaner does not rely on regex hacks or naive line matching. It builds chains based on the actual cumulative growth pattern, handles two-line rollup cues where the top line scrolls off, and merges fragments that belong to the same sentence even when they were split across multiple rollup commits.
Original timestamps preserved
Each cleaned cue takes the start time of the first rollup snapshot in its chain and the end time of the last. Your transcript stays aligned with the speech.
Conservative when in doubt
Cues that do not match the rollup pattern are passed through unchanged. A time-gap safeguard prevents the tool from merging legitimate spaced dialogue. If your file is not actually rollup, the output is essentially identical to the input.
100% private and offline
Processing runs in your browser. Your files are never uploaded, never logged, never seen by anyone but you. No account, no signup, no usage limits.
Rollup, Scrolling, Progressive, Pop-on: A Quick Glossary
Caption rendering styles are named confusingly. Here is what each term means in practice and how it relates to the cleaner.
Rollup captions
The official W3C term for captions that scroll one line at a time, with new text pushing existing text upward. This is what YouTube auto-captions produce.
Scrolling captions
Same thing as rollup. Used interchangeably in informal contexts and by some desktop subtitle editors.
Progressive captions
Synonymous with rollup in the speech-recognition context. Each cue contains the recognizer’s current best-guess transcript so far. The cleaner handles all three terms.
Pop-on captions (for contrast)
Captions that appear and disappear as complete units, one cue at a time, with no accumulation. This is the standard format you see in most movies and TV. Pop-on captions do not need cleaning — the cleaner will simply pass them through.
Related Subtitle Tools
After cleaning, you may want to convert formats, adjust timing, or split the file for translation:
VTT to SRT Converter
VTT to TXT Converter
Subtitle Time Shifter
Subtitle Splitter
Subtitle Tag Stripper
Why Choose Subtitles Edit
This site is built around one principle: subtitle work should be fast, private, and free. Every tool runs entirely in your browser. There are no uploads, no accounts, no usage caps, and no AI hallucinations to second-guess. The rollup cleaner uses deterministic pattern detection on the text of your file — not a language model— so its behaviour is predictable and its output is reproducible.
Frequently Asked Questions
What are rollup captions in a VTT file?−+
Rollup captions, sometimes called scrolling or progressive captions, are a captioning style where each cue contains the entire sentence built up so far, with each new cue adding one or two more words. They are produced by live speech recognizers used by YouTube, Zoom, Teams, Google Meet, and Otter. Played in a video they appear as a smoothly scrolling line, but the underlying file contains heavy text duplication.
Why does my YouTube auto-generated VTT file have duplicated text?−+
YouTube auto-captions use rollup rendering, where each cue carries forward all previous words and adds new ones. This is intentional for display but creates files where the same sentence appears dozens of times in slightly longer forms. The Subtitle Rollup Cleaner collapses these chains into single clean cues with accurate timing.
Will the cleaner work with VTT files from sources other than YouTube?−+
Yes. The tool works with any WebVTT file that uses rollup-style cumulative cues, including exports from yt-dlp, Otter, Zoom, Microsoft Teams, Google Meet, and Web Speech API recorders. The detection is pattern-based, not source-specific.
Does the tool change my original VTT file?−+
No. Your original file is never uploaded, modified, or stored. The cleaner runs entirely in your browser and produces a new cleaned file for download. The source file on your device is untouched.
Is timing preserved after cleaning?−+
Yes. Each cleaned cue inherits the start time of the first rollup snapshot in its chain and the end time of the last snapshot, so each sentence remains aligned with the speech in your video. The cleaner does not shift, retime, or invent timestamps.
What happens if my file is not in rollup format?−+
Cues that do not match the rollup pattern are passed through unchanged. If your file contains no rollup duplication, the output will be identical to the input. The tool only collapses cues when the cumulative-prefix pattern is clearly present.
Can the tool wrongly merge separate sentences that happen to repeat?−+
The cleaner uses two safeguards. First, it only chains cues that touch in time (gap under two seconds). Second, it only collapses cues whose growth pattern matches cumulative rollup — a true superset extension. Legitimate dialogue with spaced repetition is preserved as separate cues.
Does the tool support SRT files?−+
Not yet. Rollup output is overwhelmingly a WebVTT phenomenon — it is generated by browser-based live speech recognizers that export VTT natively. The current version is VTT-only. If you have an SRT file from a rollup-style source, convert it to VTT first using the SRT to VTT Converter.
How do I download YouTube auto-captions in the first place?−+
If you own the video, open YouTube Studio, select the video, click Subtitles, choose the auto-generated track, and select Download as VTT. For videos you do not own, command-line tools such as yt-dlp can fetch the auto-caption track with the --write-auto-sub flag, which produces a VTT file in rollup format ready to clean.
Is the cleaner free, and does it require an account?−+
Yes, fully free. No account, no signup, no upload, no usage limits. The tool runs in your browser and processes files locally. Your subtitles never leave your device.
How big a VTT file can the cleaner handle?−+
Because processing happens in your browser, file size is limited only by your device’s memory. VTT files from multi-hour streams (tens of thousands of cues) clean in a few seconds on a typical laptop. There is no server-side cap.
Does the cleaner preserve formatting tags or VTT cue settings?−+
Plain text and line breaks within cleaned cues are preserved. VTT cue settings (such as positioning hints) and inline tags from the raw rollup output are dropped, because they refer to the rollup rendering rather than to the cleaned sentences. If you need styling, apply it after cleaning.