Subtitle Formats

There are various types of subtitle formats and files.  When converting videos, it is important to know which type of subtitle you are working with and what the limitations are for each type.  It is also important to be aware of the limitations of the various video file formats with regards to the handling of subtitles.   

Text-Based Subtitle Formats

As the name implies, these subtitles are files of text. You can open and edit these files in any text editor. These are the only types of subtitles that can be muxed into MP4/M4V videos using MP4tools. They can also be permanently burned into these videos. MKVtools can also mux or burn these subtitles into MKV videos. There are a number of different formats of text-based subtitles. SUBtools can import and export the following:

SubRip (SRT)

Probably the most basic of the text-based subtitle formats. It can be identified by the .srt extension. The files consist of blocks of text like the following, separated by a blank line.

1
00:00:03,500 --> 00:00:07,000
This is the first dialog line.

2
00:00:11,000 --> 00:00:14,000
This is the second dialog line.

It has a very limited set of text formatting available using HTML like tags. SUBtools supports - bold, italics and underline formatting. How well the formatting is honored depends on the player. The formatting will not be honored by Apple hardware.

Typical problems that can occur when trying to incorporate these files into a video (burning or muxing) or during playback include:

  • Overlapping times - Other subtitle formats can have dialog that is on the screen at the same time. If an SRT subtitle is converted from one of these subtitles, it is possible that the times of successive dialog lines will overlap. This can cause problems in SRT subtitles. SUBtools will automatically find these time conflicts and can resolve many of them.
  • Extra/missing blank lines - Depending on the encoding process and/or the playback method, the omission or addition of blank lines other than the single line separating each dialog block can cause errors. SUBtools will automatically fix these issues when it opens a subtitle file.
  • Text encoding - See the section below for a discussion on text encodings. In general, to minimize issues, it's best if SRT subtitle files are created using UTF-8 text encoding. SUBtools has a preference that will let you set the default text encoding when saving or transferring SRT subtitles.

SubStation Alpha (SSA) and Advanced SubStation Alpha (ASS)

These are much more advanced types of subtitle formats that offers numerous text formatting options and karaoke-like effects. They can be identified by the .ssa or .ass extensions. Advanced SubStation Alpha is an extension of SubStation Alpha offering an expanded feature set.

These files consists of up to five sections. The three sections that are typical in most files are:

  • The [Script Info] section contains general information about the subtitle file
  • The [V4 Styles] section contains a list of style definitions. A style describes how text of a given style will appear on the screen. Most of these options can be edited within SUBtools.
  • The [Events] section contains the list of text and timing for each line of dialog. It also specifies the style to be used for each line of dialog and can include location information and code for various effects.

These types of subtitles can not be muxed into an MP4/M4v files, though they can be burned and the formatting will be retained. They can be either muxed or burned in MKV videos.

 

Text-Based Subtitles and Text Encodings

Words and sentences that we can read are in essence a collection of characters (letters, punctuation marks, symbols, ...). Computers however don't read characters. They understand numbers. So there needs to be a "bridge" between the characters world and the numbers world. To accomplish this, characters are grouped into character sets and each character in a set is assigned a unique number or code which is represented on the computer by one or more bytes. So characters are stored on a computer using this code. A Character Encoding (or Text Encoding) is a key to translate the code into characters.

A difficulty arises in that there are a number of different Character Sets and Text Encodings. For example, the number 121 might translate to an "M" with Text Encoding A, but it might translate to "<" with Text Encoding B. So to "read" a character based file, like a subtitle file, it is important to know which text encoding was used to create the file. Unfortunately, there is no foolproof way to determine the text encoding used.

When using SUBtools, being aware of the text encoding is important in two situations:

  • When you open the file, you need to know which text encoding to use so that you end up with the correct characters. There is no automatic way to do this. You have to open the subtitle and then check the results. If you're seeing strange characters in the dialog, or if you can't even open the file, then you need to try a different encoding.
  • When you save the file, you need to specify the text encoding to use. Note it does not have to be the same as the encoding used to open the file. You set this in the preferences. UTF-8 is probably the best choice.

 

Image-Based Subtitles

As the name implies, these subtitles are in essence a collection of images. These types of subtitles can not be muxed into MP4/M4V videos using MP4tools. They can only be burned. MKVtools can either burn or mux these subtitles into MKV videos.

Since these subtitles are image-based, it is difficult to edit these subtitles. To convert them to text-based subtitles, which allows for easier editing, optical character recognition (OCR) software is needed though the results will typically require editing. SUBtools has this capability. It can import and export the following:

VOBSUB

This subtitle format is generated by exporting the subtitles from a DVD. It consists of two files. The ".sub" file is in essence a collection of bmp images of each subtitle line. The ".idx" file is text-based and contains general information about the subtitle including the time codes and the location of the subtitle image within the .sub file.

Presentation Graphic Stream (PGS)

This subtitle format is generated by exporting the subtitles from a Blu-ray Disc. It consists of one file with the .sup extension.

Muxing vs. Burning

Muxing

  • Also known a Soft Subtitles or Soft Coding
  • Typically, the subtitle can be turned "on and off"
  • When muxing subtitles, video can be passed thru so there is no quality loss in the output file and the process is relatively quick
  • The types of subtitle files that can be muxed depends on the output file type
    • MP4/M4V - only text-based subtitles can be muxed
    • MKV - all types of subtitle formats can be muxed
    • AVI - technically subtitles can not be muxed, but external subtitle files with the same name as a video file will be treated like muxed subtitles by most players.
  • When muxing into MP4/M4V videos, formatting information is typically lost and the hardware will determine properties like font size

Burning

  • Also know as Hard Subtitles or Hard Coding
  • When burning subtitles, the video must be re-encoded so there can be quality loss in the output file and the process takes significantly longer than muxing.
  • These subtitles can not be turned off. They are always displayed.
  • All of the subtitle formatting information is maintained