Bilingual Subtitle Interleaver

This tool merges two subtitle tracks into a single dual-language file: each cue stacks both languages on separate lines so you can read along in two languages at once. It is built for language learners who pair native and target-language subtitles. Both SRT and WebVTT are supported on input, with your choice of output format. Everything runs entirely in your browser; nothing is uploaded to a server.

Font size:px

File A

File B

Options

Alignment mode:Match by cue indexMatch by closest timestamp

Output format

Show language labels in output

Output (dual-language)

How it works

Load both subtitle files into File A and File B using paste, drag-and-drop, or the Choose file buttons. The tool detects SRT and WebVTT automatically from the file contents, and you can mix formats freely—for example, an SRT track in File A and a WebVTT export in File B—without converting either side first.
Choose how the two timelines should line up. "Match by cue index" pairs the first cue of File A with the first cue of File B, the second with the second, and continues in lockstep down the list. "Match by closest timestamp" instead pairs each cue in File A with the cue in File B whose start time is nearest, as long as that neighbor falls within a two-second tolerance window, which helps when translators split lines differently.
Pick the output format that fits your workflow. SRT remains the most widely supported caption format and will open in virtually every desktop media player, editor, and streaming prep pipeline. WebVTT is the native subtitle format for browsers and HTML5 video, which makes it the better choice when you are embedding captions on a website or testing in a web player.
Optionally enable language labels so each block of dialogue is prefixed with a short tag such as [English] or [Spanish]. That extra structure is helpful when you are annotating lines, building study notes, or exporting text for side-by-side review, but most viewers do not need labels for everyday playback.
When the preview looks right, use Copy to place the merged text on your clipboard or Download to save a file. The saved filename is either bilingual.srt or bilingual.vtt depending on the output format you selected, so you can drop it straight into your player or editor without renaming.

When to use cue index versus closest timestamp

The better alignment mode depends entirely on how your two subtitle files were authored, exported, and timed. When both tracks describe the same master in the same way, index pairing is simpler and faster. When cue boundaries diverge because of translation style or platform differences, timestamp pairing keeps the dialogue aligned even if the lists are different lengths.

Use cue index when both files come from the same source

Reach for cue index when both files clearly belong to the same release pipeline—two language tracks exported from the same disc image, two caption streams downloaded for the same YouTube upload, or dual-language assets delivered together from a streaming vendor. In those situations the dialogue order and cut points almost always line up cue for cue, so positional pairing is both the quickest option and the least likely to drift. You still get the stacked bilingual text in every entry, but you avoid the extra bookkeeping that timestamp mode performs when it searches for neighbors.

Use closest timestamp when files have different cue counts

Professional and fan translators routinely split or merge lines differently across languages. A long English sentence might become two shorter Spanish cues, or two terse German lines might be combined into a single flowing French cue. When cue counts differ, index mode keeps pairing positionally anyway, which means one mismatch early in the file can leave the rest of the stacked dialogue visibly wrong for minutes at a time. Closest-timestamp mode instead walks through File A and claims the nearest unused cue in File B inside a two-second window, so the languages stay aligned on the timeline even when the lists no longer match one-to-one.

Common use cases

The interleaver is aimed at anyone who wants both languages visible in a single subtitle track instead of juggling two separate files. The scenarios below are the ones we see most often in feedback from learners, teachers, and polyglot viewers.

Studying a language with films and TV

Language learners often keep the original dialogue while reading a familiar language underneath, so they can confirm meaning without breaking immersion in the spoken target language. Stacking both languages in one cue lets the eyes move vertically instead of hunting through two separate tracks. People who pair video with Migaku, Language Reactor, LingQ, or similar study stacks can export or paste the merged SRT or VTT as clean input material for those workflows; the tool does not replace those products, it simply prepares a dual-language file they can import like any other subtitle.

Comparing official subtitles against a fan translation

Collectors, teachers, and advanced learners sometimes want the licensed translation on one line and a community version on the next. Seeing both interpretations inside the same timed cue makes it easier to notice mistranslations, tone shifts, or localization choices without constantly pausing to swap tracks. Because timing stays anchored to whichever alignment mode you chose, you can scan an entire episode for divergences the way you would use diff tools for plain text, only here the structure is still a valid subtitle file.

Preparing dual-language study material

Tutors, conversation partners, and self-study groups often distribute offline clips where students must read both languages without toggling subtitle menus mid-scene. A merged file behaves like any ordinary SRT or WebVTT in VLC, IINA, MPC-HC, classroom projectors, or learning management systems that accept standard caption uploads. That means you can email the file, archive it on a shared drive, or load it on a flight without installing specialized bilingual playback software—just one track with both languages stacked where learners expect them.

Why use this tool

Long films and TV episodes can contain thousands of subtitle cues, which quickly runs into token limits when you ask ChatGPT-style assistants to merge entire tracks in one shot. This tool avoids that ceiling because it runs locally in JavaScript, so there is no context window and no artificial cap on file length. Pairing two timelines also demands precise millisecond arithmetic on every cue; generative models occasionally hallucinate timestamps, merge dialogue incorrectly, or drop entries, whereas this merger applies deterministic rules so the same inputs always produce the same output. Privacy matters just as much: nothing is uploaded to a server, which is important for classroom clips, personal collections, or any source you would hesitate to hand to a third party. There are no accounts, no usage quotas, and no server-side inspection of your subtitle contents.

Frequently Asked Questions

What is a bilingual subtitle file?

It is a normal SRT or WebVTT file where each timed cue contains two languages at once, usually stacked so one language appears on the first line (or block) and the other directly underneath. Players and study tools then show both readings in sync with the same on-screen timing.

Will this work for Netflix or YouTube subtitles?

Yes, as long as you have legitimately obtained subtitle files for both languages. Results are cleanest when both tracks come from the same platform or release, because cue boundaries and timing then tend to line up more predictably.

What's the difference between cue index and closest timestamp alignment?

Match by cue index pairs the first cue in File A with the first cue in File B, the second with the second, and so on. It is the fastest option when both files follow the same dialogue order. Match by closest timestamp is for files where translators split or merged lines differently: for each cue in A, the tool picks the unclaimed cue in B whose start time is nearest, as long as it is within about ±2 seconds, then adds any leftover B cues on their own.

Can I mix an SRT and a VTT file?

Yes. Each side is parsed according to whether it looks like WebVTT (WEBVTT header) or SRT. You still choose whether the merged file is written as SRT or VTT.

Does this tool upload my files anywhere?

No. Parsing, alignment, and export all happen in your browser. Your files never leave your device.

Why don't my two subtitle files have the same number of cues?

Different translators and platforms often split sentences into different numbers of on-screen lines. If counts do not match, try closest timestamp mode so cues are paired by time instead of by position in the list.