Skip to content

Bilingual Subtitle Interleaver

This tool merges two subtitle tracks into a single dual-language file: each cue stacks both languages on separate lines so you can read along in two languages at once. It is built for language learners who pair native and target-language subtitles. Both SRT and WebVTT are supported on input, with your choice of output format. Everything runs entirely in your browser; nothing is uploaded to a server.

px
File A
File B
Options
Alignment mode:
Output (dual-language)

How it works

  1. Load both subtitle files into File A and File B using paste, drag-and-drop, or the Choose file buttons. The tool detects SRT and WebVTT automatically from the file contents, and you can mix formats freely—for example, an SRT track in File A and a WebVTT export in File B—without converting either side first.
  2. Choose how the two timelines should line up. "Match by cue index" pairs the first cue of File A with the first cue of File B, the second with the second, and continues in lockstep down the list. "Match by closest timestamp" instead pairs each cue in File A with the cue in File B whose start time is nearest, as long as that neighbor falls within a two-second tolerance window, which helps when translators split lines differently.
  3. Pick the output format that fits your workflow. SRT remains the most widely supported caption format and will open in virtually every desktop media player, editor, and streaming prep pipeline. WebVTT is the native subtitle format for browsers and HTML5 video, which makes it the better choice when you are embedding captions on a website or testing in a web player.
  4. Optionally enable language labels so each block of dialogue is prefixed with a short tag such as [English] or [Spanish]. That extra structure is helpful when you are annotating lines, building study notes, or exporting text for side-by-side review, but most viewers do not need labels for everyday playback.
  5. When the preview looks right, use Copy to place the merged text on your clipboard or Download to save a file. The saved filename is either bilingual.srt or bilingual.vtt depending on the output format you selected, so you can drop it straight into your player or editor without renaming.

What the merged output looks like

Suppose File A holds the English track and File B holds the Spanish track, each with a cue at the same time:

File A (English)
1
00:00:01,000 --> 00:00:03,000
Good morning.
File B (Spanish)
1
00:00:01,000 --> 00:00:03,000
Buenos días.

Interleaving them produces one cue with both languages stacked (shown here as SRT):

1
00:00:01,000 --> 00:00:03,000
Good morning.
Buenos días.

With language labels enabled, the same cue is tagged so each line is easy to identify:

1
00:00:01,000 --> 00:00:03,000
[English]
Good morning.
[Spanish]
Buenos días.

When to use cue index versus closest timestamp

The better alignment mode depends entirely on how your two subtitle files were authored, exported, and timed. When both tracks describe the same master in the same way, index pairing is simpler and faster. When cue boundaries diverge because of translation style or platform differences, timestamp pairing keeps the dialogue aligned even if the lists are different lengths.

Use cue index when both files come from the same source

Reach for cue index when both files clearly belong to the same release pipeline—two language tracks exported from the same disc image, two caption streams downloaded for the same YouTube upload, or dual-language assets delivered together from a streaming vendor. In those situations the dialogue order and cut points almost always line up cue for cue, so positional pairing is both the quickest option and the least likely to drift. You still get the stacked bilingual text in every entry, but you avoid the extra bookkeeping that timestamp mode performs when it searches for neighbors.

Use closest timestamp when files have different cue counts

Professional and fan translators routinely split or merge lines differently across languages. A long English sentence might become two shorter Spanish cues, or two terse German lines might be combined into a single flowing French cue. When cue counts differ, index mode keeps pairing positionally anyway, which means one mismatch early in the file can leave the rest of the stacked dialogue visibly wrong for minutes at a time. Closest-timestamp mode instead walks through File A and claims the nearest unused cue in File B inside a two-second window, so the languages stay aligned on the timeline even when the lists no longer match one-to-one.

Common use cases

The interleaver is aimed at anyone who wants both languages visible in a single subtitle track instead of juggling two separate files. The scenarios below are the ones we see most often in feedback from learners, teachers, and polyglot viewers.

Studying a language with films and TV

Language learners often keep the original dialogue while reading a familiar language underneath, so they can confirm meaning without breaking immersion in the spoken target language. Stacking both languages in one cue lets the eyes move vertically instead of hunting through two separate tracks. People who pair video with Migaku, Language Reactor, LingQ, or similar study stacks can export or paste the merged SRT or VTT as clean input material for those workflows; the tool does not replace those products, it simply prepares a dual-language file they can import like any other subtitle.

Comparing official subtitles against a fan translation

Collectors, teachers, and advanced learners sometimes want the licensed translation on one line and a community version on the next. Seeing both interpretations inside the same timed cue makes it easier to notice mistranslations, tone shifts, or localization choices without constantly pausing to swap tracks. Because timing stays anchored to whichever alignment mode you chose, you can scan an entire episode for divergences the way you would use diff tools for plain text, only here the structure is still a valid subtitle file.

Preparing dual-language study material

Tutors, conversation partners, and self-study groups often distribute offline clips where students must read both languages without toggling subtitle menus mid-scene. A merged file behaves like any ordinary SRT or WebVTT in VLC, IINA, MPC-HC, classroom projectors, or learning management systems that accept standard caption uploads. That means you can email the file, archive it on a shared drive, or load it on a flight without installing specialized bilingual playback software—just one track with both languages stacked where learners expect them.

Why use this tool

Long films and TV episodes can contain thousands of subtitle cues, which quickly runs into token limits when you ask ChatGPT-style assistants to merge entire tracks in one shot. This tool avoids that ceiling because it runs locally in JavaScript, so there is no context window and no artificial cap on file length. Pairing two timelines also demands precise millisecond arithmetic on every cue; generative models occasionally hallucinate timestamps, merge dialogue incorrectly, or drop entries, whereas this merger applies deterministic rules so the same inputs always produce the same output. Privacy matters just as much: nothing is uploaded to a server, which is important for classroom clips, personal collections, or any source you would hesitate to hand to a third party. There are no accounts, no usage quotas, and no server-side inspection of your subtitle contents.

Frequently Asked Questions

What is a bilingual subtitle file?

It is a normal SRT or WebVTT file where each timed cue contains two languages at once, usually stacked so one language appears on the first line (or block) and the other directly underneath. Players and study tools then show both readings in sync with the same on-screen timing.

Will this work for Netflix or YouTube subtitles?

Yes, as long as you have legitimately obtained subtitle files for both languages. Results are cleanest when both tracks come from the same platform or release, because cue boundaries and timing then tend to line up more predictably.

What's the difference between cue index and closest timestamp alignment?

Match by cue index pairs the first cue in File A with the first cue in File B, the second with the second, and so on. It is the fastest option when both files follow the same dialogue order. Match by closest timestamp is for files where translators split or merged lines differently: for each cue in A, the tool picks the unclaimed cue in B whose start time is nearest, as long as it is within about ±2 seconds, then adds any leftover B cues on their own.

Can I mix an SRT and a VTT file?

Yes. Each side is detected independently from its contents — a WEBVTT header marks a file as WebVTT, otherwise it is read as SRT — so you can pair an SRT track with a VTT track freely. You then choose whether the merged file is written out as SRT or VTT.

Does this tool upload my files anywhere?

No. Parsing, alignment, and export all happen locally in your browser using JavaScript; your files are never sent to a server, stored, or logged. That keeps classroom clips, personal collections, and unreleased material private, and closing the tab clears everything you loaded.

Why don't my two subtitle files have the same number of cues?

Different translators and platforms often split sentences into different numbers of on-screen lines. If counts do not match, try closest timestamp mode so cues are paired by time instead of by position in the list.

How is this different from the Subtitle Merger?

The Subtitle Merger joins subtitle tracks along the timeline — useful for stitching together parts of one film or combining clips in sequence. This interleaver instead stacks two languages inside the same cue, so each on-screen entry shows both readings at once. Use the merger to extend a timeline, and the interleaver to study two languages together.

The two languages are in the wrong order — how do I switch them?

Use the Swap A B button above the File B box. It exchanges the contents of the two panels, so whichever language you loaded into File B moves to the top line and File A drops underneath. The output updates instantly, so you can flip the order without reloading either file.

What does the Matched / Unmatched status line mean?

After both files load, the tool reports how many cues it paired and how many were left over on each side. A high unmatched count usually means the two files were split differently or are out of sync. Switching to closest-timestamp mode often raises the matched count by pairing cues within a two-second window.

My subtitles are out of sync before I even merge them — what should I do?

Fix the timing first, then interleave. If a whole track is uniformly early or late, run it through the Subtitle Time Shifter; if it drifts further off as the video plays, use the AI Subtitle Drift Stabilizer. Once both tracks line up against the same video, this tool can pair them cleanly.

Should I export the merged file as SRT or VTT?

Choose SRT for desktop players and editors like VLC, IINA, or most NLEs, since it is the most broadly supported caption format. Choose VTT when embedding captions in a browser or HTML5 video. If you change your mind later, the SRT to VTT and VTT to SRT converters will switch formats without re-merging.

Related tools