Skip to content

Captions

8 minute read

Last updated:

Captions are text versions of dialogue and important sounds in a video. Captions allow people who are deaf, hard of hearing, or have difficulty processing audio content to understand multimedia. Others may prefer to use captions in certain situations, such as when a speaker is hard to understand or in a noisy environment.

Captions are text typically displayed on the bottom of multimedia, such as movies or videos. They communicate the dialogue and important sounds to anyone who cannot hear them. Accurate captions are essential to ensure captions match the media’s audio. Inaccurate captions can confuse and frustrate people who need them to understand the media’s content. Inaccurate captions may only be annoying in some instances, but they can have real consequences. Imagine a test question based on a video; if the captions you depend on are not accurate, you could fail the test.

You can add captions manually or generate them automatically. Auto-generated captions are often free, but the quality is nearly always below what viewers need to understand what is going on. These captions frequently include mistakes about dialogue and who is speaking. They also do not include any information about important sounds. That said, auto-generated captioning can be a temporary patch when a better service is not possible. Many companies are working to improve auto-generated captioning, so this option is improving over time. It is best to provide manual captions or else edit the auto-generated captions until they reach a point where the viewer can fully understand the dialogue with just the captions.

Captions can be closed or open:

  • Closed captions can be turned on or off. They are created and stored separately from the video and audio tracks. They can be easily updated, searched, and turned into a transcript.
  • Open captions are burned directly onto the image. Because they become part of the visual information, they cannot be adapted, searched, or used separately from the video.
  • Closed captions are always a better choice than open captions.

Providing good quality captions takes some pre-planning, but the benefits are worth the effort.

Captions provide access to audio content in multimedia for people who are Deaf or hard of hearing.

Captions also help individuals who have trouble understanding spoken dialogue. This misunderstanding may be because of a cognitive disability or because someone is a non-native speaker of the video’s language. It may also be because someone in the video is talking quickly, has an accent, is talking over other audio, or speaks more quietly than the rest of the audio track.

Captions help people in noisy environments, when audio speakers are not available, or when audio might disturb others and headphones are not available.

Captions also make video content searchable, and this helps many people who are looking for specific information in a video.

Captions do not help everyone or every situation:

  • Audio-only files need Transcripts (Ta11y) instead of captions.
  • People who are blind need Audio Descriptions.
  • People who are deaf-blind need Descriptive Transcripts (Ta11y).

The Web Content Accessibility Guidelines (WCAG) are international standards that apply to all digital content. Two WCAG criteria address captions:

  • WCAG Success Criterion 1.2.2: For pre-recorded video, captions are required at WCAG Level A to provide access to the audio content to people who are Deaf or hard of hearing.
  • WCAG Success Criterion 1.2.4: For live video, captions are required at WCAG Level AA to provide the audio content to people who are Deaf or hard of hearing.

Accessibility laws usually require compliance with WCAG level A and AA criteria. Level A criteria represent the minimum level of accessibility that you must meet. Level AA criteria provide a functional level of accessibility for most people with disabilities. Level AAA criteria are best practices and are strongly recommended to achieve a higher level of accessibility.

The U.S. has several other regulations that impact captions:

  • Accurately transcribe any speech in the audio content.
  • Include important noises such as music that set the mood or sounds that provide context.
  • Display the caption text at the same time as the words and sound they represent.
  • Follow a captioning style guide for consistency (see the “Captioning Style Guides” list below under “Resources”).
  • Provide closed captions, not open.
  • Display captions in a location that does not obscure the video content.
  • If possible, give users the ability to manage the appearance and location of captions.
  • Include all spoken words
  • Identify speakers if necessary for comprehension, including off-screen speakers
    • Use the speaker’s name if known
    • If the speaker’s name is unknown, use another label such as “NARRATOR,” “SPEAKER 1,” or “LAWYER”
    • You can use either all caps or parentheses for speaker identification, but be consistent
  • Include sound events that have meaning or impact the story
    • Enclose sound effects in brackets and use capital letters
  • Use italics for:
    • Narration and voiceovers
    • Out-of-scene dialogue
    • Dialogue from a radio, television, phone, or other electronic device
    • Any included song lyrics
    • Well-known foreign sayings
    • Use sparingly to show emphasis in speech

For detailed guidance on formatting captions, use a guide like the Described and Captioned Media Program’s Captioning Key (see “Captioning Style Guides” under “Resources” for more captioning guides).

Videos often rely on auto-generated captions that are usually of inferior quality.

You must always edit auto-generated captions to improve their quality or replace them with manually created captions. If you provide a recording of a live event and have a transcript from captions used in this event, you can often upload these to a service and sync them with the video. If the recording was pre-scripted, you can often use that text to create captions quickly and accurately.

In all cases, someone will need to review the caption file while listening to the video and ensure the captions are accurate. Often, they will need to:

  • Correct words in captions to match dialogue
  • Correct spelling errors
  • Correct and capitalize acronyms
  • Add speaker names
  • Add punctuation and capitalization
  • Synchronize captions with the audio by creating new breakpoints

You must ensure that the words in the captions match the words that are spoken. Anyone captioning or cleaning up captions should be familiar with basic captioning guidance.

Many streaming services provide auto-generated captions and allow you to use a third-party captioning service. The quality of auto-captions is improving, but the error rate remains high enough that they can be difficult to understand, particularly if the event has multiple speakers, music, or other auditory noise or complexity.

You should arrange for a captioning service for any live streams. If that is not possible, use auto-captioning during the live event and add better-quality captions before posting the final video online. Also, consider offering sign language interpreting on request.

Each live-streaming platform for online events supports captions in different ways. The following table links to auto-captioning instructions and streaming service set-up instructions by platform.

PlatformAuto-generated CaptionsSet up Streaming Services
FacebookTurned on by defaultClosed caption “how to” guide
YouTubeTurn on YouTube captionsSend captions to YouTube
ZoomEnable automated captionsUse a third-party captioning service
Manage manual captions
Table 1: Providing Captions for Live Streaming

Each platform supports captions for recorded videos in different ways. The following table links to manual and auto-generated captioning instructions by platform.

PlatformAuto-generated CaptionsManual Captions
FacebookTurned on by defaultAdd captions to Facebook video
YouTubeTurn on captionsAdd or edit captions
Add subtitle editor access
ZoomAudio transcriptions for cloud recordingsEdit the audio transcript
Table 2: Adding Captions for Recorded Video

The first and most critical test for captions is simply to ensure that all audio-visual (synchronized media) files and events have captions. For recorded videos, manually inspect the captions to ensure that the basic guidelines for editing captions have been followed and that the captions accurately represent the audio portion of the video.

It is strongly recommended that you use a manual captioning service. If you use auto-generated captioning services, you should test them with samples of typical video before purchasing the service to ensure they provide as accurate a transcription as possible.