Best Practices for Video Captioning and Transcripts

Updated on Jan 21, 2025

Captions in a video convey a text version of the speech and non-speech audio information to people who are Deaf and hard-of-hearing. For people who are Deaf and hard of hearing, captions are necessary to understand the content. Captions are synchronized with the audio and usually shown in a media player with an option for users to turn them on.

According to the Web Accessibility Initiative, "Automatically-generated captions do not meet user needs or accessibility requirements, unless they are confirmed to be fully accurate. Usually they need significant editing." Although creating your own captions is not required, it is best practice to make a video as accessible as possible. This article does not include comprehensive instructions on how to create captions. Instead, it focuses on some best practices and tips when creating captions and transcripts for a video.

General Captions Tips

Captions should include:

All words spoken by people on screen exactly as they are spoken.
All words spoken by a narrator exactly as they are spoken.
The title, artist, and words to any song, indicated with a music symbol.
Identification for speakers who are off screen at the time.
Descriptions of sound events that impact on the story or meaning.
Captions should not include informational text that is already displayed on screen like text in a PowerPoint presentation.

Font

Characters should be a white San-serif font, such as Arial or Helvetica.

Background

Captions should be contained within a black background box.

Sentence Style

Use normal grammar and sentence structure.
Each sentence should use a mix of uppercase and lowercase letters.
Use punctuation to convey the way speech is delivered.
Use ellipses when there is a significant pause in speech. (e.g. That's...fantastic).
To maintain a normal reading rate, non-essential information can be removed.
Use spelling for numbers from one to ten but use numerals for all numbers over ten (11, 12, etc.).

Spacing

There should be no more than 2-3 lines of captions displayed on the screen at one time.
Each line should be between 30 and 37 characters in length.
Line and caption breaks should reflect the natural flow of the sentence and its punctuation.

Timing

Captions should coincide with the visuals (when someone is speaking, the captions of their dialogue should be displayed on screen).
Captions need to remain long enough on screen to be read.
- The reading speed should not exceed 180 words per minute (three words per second).
- A sentence should not remain on screen for less than two seconds.

Sound and Speaker Identifications

Make sure to caption noises or music that enhances the visuals.
- For example, footsteps off screen that are used to announce someone arriving.
A conversation in a noisy public space should only display the conversation of the people being presented on screen.
Sound effects should be shown in square brackets. For example, [dog barking]
An off-screen speaker's name should be identified in round brackets, e.g. (John)
Speakers' names and sound effects should be shown on a line of their own.
Make sure to caption lack of sound or muffled sounds if it adds to the content or atmosphere.
- Identify moments when the sound cuts or fades out.
- Identify moments when speakers are not heard (i.e. a character is moving their lips without speaking).

Transcript Tips

All media with audio should also include a transcript. A transcript is a file of all the spoken content of video that is viewed separately from the video. Transcripts offer an additional format to make audio and videos accessible. People who are Deaf, hard of hearing, or have low vision can benefit from having accessible transcripts.

A transcript document should follow all best practices for creating accessible documents. Please see the following resources for creating accessible documents:

Transcripts should include:

All words spoken by people on screen exactly as they are spoken.
All words spoken by a narrator exactly as they are spoken.
Identification for all speakers.
Descriptions of sound events.
Transcripts should not include informational text that is already displayed on screen like text in a PowerPoint presentation.

Sound and Speaker Identifications

The transcript should include speaker identification in brackets. Example: [Professor] or [Name].
The transcripts should include non-speech sounds in brackets. Example [Footsteps], [cough], or [pop music].

Timestamps

A transcript should include as little timestamps as possible. Less is more when it comes to timestamps in transcripts. Possible timestamp locations could include:

A change in speaker
A slide change

Additional Resources

Previous Article Avoid Flashing Content

Next Article Adding Captions to a Brightspace Video Note

Southern New Hampshire University

Best Practices for Video Captioning and Transcripts

General Captions Tips

Font

Background

Sentence Style

Spacing

Timing

Sound and Speaker Identifications

Transcript Tips

Sound and Speaker Identifications

Timestamps

Additional Resources

Accessibility and Assistive Technology

Other Resources

Best Practices for Video Captioning and Transcripts

Heading anchor General Captions Tips

Heading anchor Font

Heading anchor Background

Heading anchor Sentence Style

Heading anchor Spacing

Heading anchor Timing

Heading anchor Sound and Speaker Identifications

Heading anchor Transcript Tips

Heading anchor Sound and Speaker Identifications

Heading anchor Timestamps

Heading anchor Additional Resources

Accessibility and Assistive Technology

Other Resources

General Captions Tips

Font

Background

Sentence Style

Spacing

Timing

Sound and Speaker Identifications

Transcript Tips

Sound and Speaker Identifications

Timestamps

Additional Resources