Skip to content

Subtitle Encoding Fixer — Repair Mojibake and Weird Characters

Fix mojibake, weird characters, and garbled text in SRT, VTT, and TXT subtitle files. Free, private, and works entirely in your browser.

Drop a file here or click to browse.srt, .vtt, .txt

First 2000 characters per preview. "After" updates when you change encoding or reverse mojibake.

Before (raw UTF-8)

How the file appears with default UTF-8 reading.

Upload a file to see preview

Detected encoding:

After (fixed)

Result with your chosen encoding applied.

Upload a file to see preview

Use this if your file already contains literal mojibake characters like ’ or é that need to be un-corrupted.

How It Works

Step 1

Upload

Choose or drag in your .srt, .vtt, or .txt file. The bytes stay on your device; nothing is sent to a server.

Step 2

Preview & Adjust

The tool suggests an encoding from your file. Override it from the list, toggle reverse mojibake if text still looks wrong, and optionally add a UTF-8 BOM for picky players.

Step 3

Download Fixed File

Export clean UTF-8 text with one click. The filename adds -fixed before the extension so your original stays safe.

Before and After

A French subtitle line saved correctly as UTF-8:

1
00:00:01,000 --> 00:00:03,000
"Bonjour, mon ami — ça va?" elle dit avec un sourire.

The same line after a typical encoding round-trip mistake (UTF-8 read as Windows-1252, then re-saved as UTF-8):

1
00:00:01,000 --> 00:00:03,000
"Bonjour, mon ami â€" ça va?" elle dit avec un sourire.

The em-dash () has become â€" and the ç has become ç. Selecting the correct source encoding from the dropdown — or enabling reverse mojibake if the file has already been double-encoded — restores the original text. Timestamps and cue numbers are never affected.

Mojibake Character Decoder

If your subtitles show literal sequences of two or three Latin characters where accented letters or smart quotes should be, the file is almost certainly UTF-8 being read as Windows-1252. The table below lists the most common mojibake patterns and what they should decode to.

What you seeWhat it should beOriginal character
’Right single quote
‘Left single quote
“Left double quote
â€Right double quote
â€"En dash
â€"Em dash
…Ellipsis
ée with acute accenté
èe with grave accentè
à a with grave accentà
çc with cedillaç
üu with umlautü
öo with umlautö
ñn with tildeñ
ßGerman sharp sß

Every pattern in this table has the same root cause: UTF-8 bytes being interpreted as Windows-1252. The fix is the same in all cases. If the encoding picker is set to UTF-8 and you see these sequences, click Try reverse mojibake fix. If the picker is on Windows-1252, switching to UTF-8 (or the correct source encoding for your script) usually resolves them.

What Is Mojibake and How Does It Happen?

Subtitle files break when one application saves them in UTF-8 and another opens them with a legacy single-byte encoding like Windows-1252 — or the reverse. That mismatch produces mojibake: strings of replacement symbols, accented letters turned into pairs of Latin characters, or diamond question marks where real letters should be.

Three patterns cause most subtitle corruption:

  • Encoding mismatch. The file was written in one encoding and is being read as another. Russian subtitles saved as Windows-1251 but opened as UTF-8 produce one type of garbled output; UTF-8 French subtitles opened as Windows-1252 produce another.
  • Double-encoding. UTF-8 bytes were misread as Windows-1252, then re-saved as UTF-8. The result is mojibake layered on top of itself — literal sequences like é appearing in the text instead of é. The reverse mojibake toggle is built for this exact case.
  • BOM confusion. Some older Windows applications expect a UTF-8 byte-order mark to recognise multilingual content; others mishandle it. The BOM checkbox lets you control whether the exported file includes one.

How the Tool Detects Encoding

On upload, the Subtitle Encoding Fixer auto-detects encoding in this order:

  • UTF-8 BOM (bytes EF BB BF) → labelled "UTF-8 (BOM)"
  • UTF-16LE BOM (bytes FF FE) → decoded and transcoded to UTF-8 for a consistent pipeline
  • UTF-16BE BOM (bytes FE FF) → decoded and transcoded to UTF-8
  • Strict UTF-8 validation→ if all bytes form valid UTF-8 sequences, labelled "UTF-8"
  • Fallback → Windows-1252, a sensible default for Western legacy text

Legacy non-BOM encodings like Windows-1251 (Cyrillic), Shift_JIS (Japanese), or Big5 (Traditional Chinese) cannot be unambiguously detected from bytes alone. If auto-detect picks Windows-1252 but your file is actually Cyrillic or East Asian, pick the right encoding from the dropdown and watch the After preview update in real time.

Choosing the Right Encoding for Your Language

Auto-detect handles BOM-marked files and valid UTF-8 files automatically. For legacy files without BOMs, use this guide to pick the most likely encoding from the dropdown:

Language or scriptTry firstIf that fails
English, French, Spanish, German, Italian, Portuguese, DutchWindows-1252 (Western)ISO-8859-1 (Latin-1)
Polish, Czech, Hungarian, Romanian, Slovak, Slovenian, CroatianWindows-1250 (Central European)ISO-8859-2 (Latin-2)
Russian, Ukrainian, Bulgarian, Serbian Cyrillic, BelarusianWindows-1251 (Cyrillic)UTF-8 (modern files)
JapaneseShift_JISUTF-8 (modern files)
Simplified ChineseGB18030UTF-8 (modern files)
Traditional ChineseBig5UTF-8 (modern files)
KoreanEUC-KRUTF-8 (modern files)
Mixed-script / multilingualUTF-8Try reverse mojibake fix

The "Try first" column reflects the encoding that historically shipped with subtitle files for that language before UTF-8 became universal around 2010. Files produced after about 2015 are most often UTF-8 regardless of language.

Step-by-Step Recipes for Common Scenarios

Below are explicit walkthroughs for the five most common subtitle encoding problems.

Recipe 1: Fixing a Russian subtitle file with boxes or question marks

If a Russian subtitle file shows boxes, question marks, or replacement characters instead of Cyrillic letters, the file is almost certainly Windows-1251 (the standard Cyrillic encoding) but is being read as UTF-8.

  1. Upload the file. Detected encoding will likely show windows-1252 (the fallback, since strict UTF-8 fails on Cyrillic bytes).
  2. Open the "Interpret file as" dropdown and select Windows-1251 (Cyrillic).
  3. The After preview should now show readable Russian text — Привет, Здравствуйте, and so on.
  4. Leave reverse mojibake off and BOM off. Click Download Fixed Subtitle.

Recipe 2: Fixing French subtitles with sequences like ’ and é

If French (or any Western European language) subtitles show literal ’, é, è, à characters in the visible text, the file is double-encoded — UTF-8 was misread as Windows-1252 and then re-saved as UTF-8.

  1. Upload the file. Detected encoding will show UTF-8 because the file IS valid UTF-8 — it just contains mojibake characters as its actual content.
  2. The Before preview shows the mojibake characters.
  3. Click the Try reverse mojibake fix button.
  4. The After preview now shows clean French — café, résumé, c'est, and so on.
  5. Click Download Fixed Subtitle.

Recipe 3: Fixing Japanese subtitles that show random symbols

Japanese subtitle files released before about 2010 are commonly encoded as Shift_JIS rather than UTF-8. Reading them as UTF-8 produces a mix of replacement characters and unreadable glyphs.

  1. Upload the file. Detected encoding will likely show windows-1252 (the fallback).
  2. From the dropdown, select Shift_JIS (Japanese).
  3. The After preview should now show readable Japanese — both kanji and kana.
  4. Click Download Fixed Subtitle.

Recipe 4: Fixing subtitles that display correctly in VLC but break on Smart TVs

Older Smart TV firmware sometimes requires UTF-8 with a BOM to recognize multi-byte characters. A BOM-less UTF-8 file may display fine in VLC but show boxes or wrong characters on TVs.

  1. Upload the file. Detected encoding should show UTF-8.
  2. Leave the encoding picker on UTF-8.
  3. Check the Add UTF-8 BOM checkbox.
  4. Click Download Fixed Subtitle. The downloaded file is identical to the source but with an EF BB BF BOM prefix.

Recipe 5: Fixing AI-transcribed subtitles with corrupted smart quotes

AI transcription tools sometimes produce subtitle files where smart quotes have been corrupted to mojibake sequences after a round-trip through legacy software.

  1. Upload the file. Detected encoding will likely show UTF-8.
  2. If you see ’, “, or … in the Before preview, click Try reverse mojibake fix.
  3. The After preview should show clean punctuation.
  4. Click Download Fixed Subtitle.

When to Use the Subtitle Encoding Fixer

  • After downloading a subtitle file that displays gibberish — accented characters turned into question marks, or sequences like ’ or é.
  • After converting subtitles between platforms — some older video editors export legacy encodings while modern players expect UTF-8.
  • When subtitles play correctly in one app but break in another — usually an encoding mismatch rather than a corruption.
  • Before uploading subtitles to YouTube, Vimeo, or other platforms that require valid UTF-8.
  • When passing subtitles between Windows, macOS, and Linux systems that have different default encodings.
  • After AI auto-transcription tools occasionally produce files with mixed or unexpected encodings.

Who Uses This Tool

Video editors and captioners working with subtitle files from international sources. Translators bridging files between language editors that default to different encodings. YouTubers, Vimeo creators, and broadcasters who need clean UTF-8 deliveries. Anyone who downloaded a subtitle pack from a non-English release and saw boxes or random symbols instead of real letters.

Why Use This Subtitle Encoding Fixer

  • Auto-detects UTF-8 (with or without BOM), UTF-16LE, and UTF-16BE on load.
  • Ten manual encoding options covering Western, Cyrillic, Central European, Latin, Japanese, Chinese, and Korean scripts.
  • Reverse mojibake toggle for the common double-encoding case.
  • Optional UTF-8 BOM on export for older Windows tools.
  • Always exports valid UTF-8, regardless of input encoding.
  • Runs entirely in your browser — no upload, no account, no install.
  • Preserves all timestamps, cue numbers, and formatting tags untouched — only the byte-to-character mapping changes.

Glossary of Subtitle Encoding Terms

Key terminology you will see in encoding documentation, error messages, and across subtitle tools.

Character encoding
A mapping between binary byte sequences and the characters they represent. The same bytes can mean different characters in different encodings, which is why subtitle files sometimes display correctly in one tool and incorrectly in another.
Code point
A unique number assigned to each character in the Unicode standard. The letter "A" is code point U+0041; the symbol "♥" is U+2665; the smart apostrophe ' is U+2019.
UTF-8
The dominant character encoding for modern text files. It uses 1 byte for ASCII characters and 2 to 4 bytes for everything else. UTF-8 is the WHATWG standard for the web and the default for modern subtitle workflows.
UTF-16
A two-byte-per-character encoding (with surrogate pairs for higher code points). UTF-16 files always start with a byte-order mark: FF FE for little-endian, FE FF for big-endian.
BOM (Byte Order Mark)
A few bytes at the start of a file that signal the encoding to the reader. UTF-8 BOM is EF BB BF; UTF-16LE BOM is FF FE; UTF-16BE BOM is FE FF. BOMs are optional for UTF-8 but mandatory for UTF-16.
Windows-1252
A single-byte encoding designed for Western European languages. Often confused with ISO-8859-1 (Latin-1) but slightly different. It's the most common source of mojibake when its bytes are misread as UTF-8.
ANSI
A legacy Windows term that refers to whichever single-byte encoding is active on a given Windows system — usually Windows-1252 in English/Western European installations, Windows-1251 on Russian Windows, etc. ANSI is NOT a specific encoding; it's a system setting.
Mojibake
Garbled text that appears when bytes are interpreted with the wrong character encoding. The name comes from Japanese 文字化け (moji-bake), literally "character transformation."
Replacement character
A symbol (typically U+FFFD, shown as a black diamond with a question mark, or as an empty box) used to represent bytes that don't form a valid character in the chosen encoding.
Code page
Another name (chiefly Windows) for a character encoding. Windows-1252 is also called "Code Page 1252" or "CP1252."

Frequently Asked Questions

Why are my subtitles showing weird characters like ’ or é?

Sequences like ’ or é are called mojibake. They appear when a UTF-8 file is opened with a different encoding, usually Windows-1252. Byte sequences that should have shown smart quotes or accented letters end up displayed as multiple Latin-1 characters. Selecting the correct source encoding or enabling reverse mojibake usually restores the original letters.

What is mojibake?

Mojibake is garbled text caused by reading bytes with the wrong character mapping — for example, displaying UTF-8 bytes as if they were Latin-1. The name comes from Japanese 文字化け (moji-bake), literally "character transformation." The result is sequences of random accents, question marks, or replacement symbols instead of real words.

What encoding should I save subtitles in?

UTF-8 is the modern standard for SRT, VTT, and TXT subtitles because it supports every script in a single file. This tool always exports UTF-8 regardless of the input encoding. If older players such as VLC on Windows or Windows Media Player show non-English characters incorrectly, enable the UTF-8 BOM checkbox before downloading.

How does the auto-detect work?

The tool checks for a byte-order mark first — UTF-8 BOM, UTF-16LE BOM, or UTF-16BE BOM — and uses the matching encoding when found. If there's no BOM, it tries strict UTF-8 decoding; if the bytes are valid UTF-8, it uses that. If neither works, it falls back to Windows-1252 as a sensible guess for Western text.

What encodings does the tool support?

The dropdown lists ten encodings: UTF-8, Windows-1252 (Western), Windows-1251 (Cyrillic), Windows-1250 (Central European), ISO-8859-1 (Latin-1), ISO-8859-2 (Latin-2), Shift_JIS (Japanese), GB18030 (Simplified Chinese), Big5 (Traditional Chinese), and EUC-KR (Korean). The tool uses the browser's built-in TextDecoder API so support depends on your browser, though modern browsers cover all of these.

What does the "reverse mojibake fix" toggle do?

It handles a specific corruption pattern: UTF-8 bytes that were once misread as Windows-1252 and then saved again as UTF-8. The toggle reinterprets the visible mojibake characters back as raw bytes and decodes them as UTF-8. Try it if the standard encoding choices don't fully clean up text that contains literal sequences like é.

Do I need to add a UTF-8 BOM?

Most modern subtitle players and video apps read UTF-8 correctly with or without a BOM. Some older Windows-era tools, including some configurations of VLC and Windows Media Player, expect a BOM before they treat the file as UTF-8 and may display non-English characters as gibberish without one. Try the BOM checkbox if those tools misread your file.

Is my subtitle file uploaded to a server?

No. The Subtitle Encoding Fixer reads your file as bytes directly in your browser using the FileReader API and decodes it using TextDecoder. Nothing is sent to any server, no account is required, and no copy of your file is stored. Close the browser tab and the data is gone.

Why does the preview only show the first 2000 characters?

To keep the page responsive on large subtitle files. The preview is for visual confirmation that your chosen encoding produces readable text; the full file is decoded and exported when you click Download. A long film's subtitles can run to tens of thousands of characters, and rendering all of them in a preview pane would be slow.

How can I tell what encoding my subtitle file is in?

There's no foolproof way to identify a file's encoding from the bytes alone — the same bytes can be valid in multiple encodings. Open the file in a text editor that shows encoding metadata: Notepad++ on Windows and BBEdit on macOS both display it. Or upload it here and read the "Detected encoding" badge after auto-detect runs.

What is ANSI encoding, and is it the same as UTF-8?

ANSI is a legacy Windows term that means "whichever single-byte encoding is your system default." On English Windows it's usually Windows-1252; on Russian Windows it's Windows-1251. ANSI is NOT the same as UTF-8 — ANSI files use one byte per character, while UTF-8 uses 1–4 bytes. Save as UTF-8 for cross-platform compatibility.

Why does my subtitle work in VLC but not on my Smart TV?

VLC handles subtitle encoding very tolerantly, while Smart TVs often have stricter requirements. The most common difference is BOM expectation — some TVs require a UTF-8 BOM to recognize multi-byte characters. Try downloading your file with the "Add UTF-8 BOM" checkbox enabled, then test on the TV again.

Can it fix subtitles in a language I don't speak?

Yes, but you'll need to know roughly which script the file should contain. If the original is Russian, try Windows-1251 or ISO-8859-5. For Japanese, try Shift_JIS. For Korean, try EUC-KR. The After preview will look like real words in that script when the right encoding is selected, even if you can't read it personally.

Does this tool change the subtitle timing or text content?

No. The Subtitle Encoding Fixer only changes how the existing bytes are interpreted as characters — it doesn't touch timestamps, cue numbers, line breaks, or formatting tags. The corrected output has identical structure to the input, just with characters that decode correctly. Use the dedicated time shifter or overlap fixer if you also need timing changes.

Does this tool work with .ass or .ssa subtitle formats?

The tool reads any text file and only changes how its bytes are interpreted as characters. ASS and SSA subtitle files are plain text under the hood, so encoding fixes work the same as for SRT. If the upload picker won't accept .ass directly, rename to .txt, fix the encoding, then rename back to .ass. Cue formatting is preserved.

What if my file is in an encoding not in the dropdown?

The tool supports the ten most common subtitle encodings. If your file uses something rarer — KOI8-R, Windows-1256 Arabic, ISO-8859-7 Greek, or another variant — you can sometimes find a close-enough match in the existing list. If results still look wrong, open the file in a text editor that supports your specific encoding, copy the text, and save fresh as UTF-8.

Other free subtitle tools you can use alongside the Encoding Fixer: