A podcast transcript is a written record of every word spoken in a podcast episode. It is the text version of the audio: same content, different format.
If you have ever read a YouTube subtitle file or downloaded an SRT file for a video, you have already used a transcript. The concept is the same for podcasts.
What is a podcast transcript?
A podcast transcript converts the spoken audio of a podcast episode into text. Depending on the tool and settings, a transcript can include:
- The full spoken text, word for word
- Speaker labels (identifying who said what)
- Timestamps (marking when each segment begins)
- Paragraph breaks or utterance-level segmentation
For a deeper dive into how transcripts are actually created and which method is right for you, see how to transcribe a podcast.
The World Wide Web Consortium (W3C) defines captions and transcripts as essential for making audio content accessible. Their Web Content Accessibility Guidelines (WCAG) 2.1, Success Criterion 1.2.1, specifically requires text alternatives for audio-only content.
A transcript is not the same as show notes. Show notes summarize an episode. A transcript contains every word. This distinction matters for accessibility compliance, SEO, and searchability, because show notes cannot substitute for a full text alternative.
Podcast transcript formats explained
Different formats exist for different reasons.
Plain text (TXT)
A plain text transcript is the full spoken content in a readable text file. No timestamps, no special formatting. Think of it as the episode in paragraph form.
This is the most universal format. You can read it, search it, copy from it, and paste it into any document or note-taking app.
SRT (SubRip Text)
SRT is a subtitle format originally built for video. Each block has a timestamp and the corresponding text:
1
00:00:14,000 --> 00:00:18,000
Welcome back to the show. Today we have
a really fascinating conversation lined up.
SRT files work in video players, presentation tools, and platforms like YouTube and Vimeo. The W3C WebVTT specification references SRT as a predecessor format.
If you want to add subtitles or closed captions to a video version of your podcast, SRT is the format you need.
VTT (Web Video Text Tracks)
VTT (often called WebVTT) is the W3C standard for web captions. It looks similar to SRT but supports more styling and positioning options. The W3C published the WebVTT specification as the official format for text tracks in HTML5 video.
VTT files work in browsers, HTML5 players, and most modern video platforms. If you are publishing captions on the web, VTT is the better choice over SRT.
A PDF transcript is a formatted document, usually with speaker names, timestamps, and paragraph breaks. It is the format to send to a client, include in a report, or attach to research notes.
What does a podcast transcript look like?
A transcript with speaker labels looks something like this:
Speaker 01 [00:00:14]: Welcome back to the show. Today we have
a really fascinating conversation lined up that I think you're
all going to enjoy.
Speaker 02 [00:00:28]: Thanks so much for having me. I've been
looking forward to this discussion for a while.
Without speaker labels, the same text appears as a continuous block with no attribution.
Speaker diarization is the process of identifying and labeling different speakers in a recording. When diarization is applied, each speaker gets a label (Speaker 01, Speaker 02, etc.) and their lines are tagged accordingly. This makes multi-speaker episodes far easier to follow. For more on how this works, see what is speaker diarization.
Who uses podcast transcripts?
Different people use transcripts for different reasons.
Podcast listeners use transcripts to find a specific quote, review a detail they heard, or read instead of listening when they cannot play audio.
Content creators and podcasters use transcripts to write show notes, create social media posts, pull quotes, and repurpose episodes into blog posts or newsletters. According to Edison Research's The Infinite Dial 2024, over 183 million Americans listened to podcasts in 2024. Almost none of that audio is searchable without a transcript.
Researchers and journalists use transcripts to cite specific claims, fact-check statements, and search across multiple episodes for patterns. A PDF transcript can be attached to research notes or cited in academic work.
People who are deaf or hard of hearing rely on transcripts as an accessibility tool. Under WCAG guidelines, audio content needs a text alternative. Transcripts are the simplest way to provide one.
Businesses and teams use transcripts to turn meeting-style podcast episodes into searchable internal documents, or to share podcast insights with colleagues who do not have time to listen.
How are podcast transcripts created?
Three main approaches exist, and they differ in speed, accuracy, and effort.
Manual transcription
Someone listens to the audio and types what they hear. This is the most accurate method but also the slowest. A professional transcriptionist takes roughly 4 hours to transcribe 1 hour of audio, according to the Association for Recorded Sound Collections. At typical rates of $1 to $2 per minute, a 60-minute episode costs $60 to $120.
Manual transcription makes sense when accuracy is critical and you have the budget and time.
AI-powered transcription
A speech recognition model processes the audio and produces a transcript automatically. Modern models like Deepgram Nova-3, OpenAI Whisper, and Google Speech-to-Text handle a 1-hour episode in 2 to 5 minutes.
Accuracy varies. Deepgram reports over 90% word-level accuracy on clean audio. Accuracy drops with background noise, overlapping speech, or heavy accents.
AI transcription is the fastest and cheapest option. It works well for most use cases and costs a fraction of manual transcription. For a detailed cost comparison, see podcast transcription cost in 2026.
Built-in platform transcripts
Some platforms provide transcripts natively:
- Apple Podcasts has started rolling out auto-generated transcripts on iOS 17.4 and later. These are readable in-app but not always exportable or accurate.
- Spotify offers transcripts for some shows, though coverage is inconsistent and the feature is still in rollout.
- YouTube generates automatic captions for uploaded videos. These are available via the "Open Transcript" button but are often error-prone.
Platform transcripts are convenient for reading but limited in accuracy, portability, and format options.
Why podcast transcripts matter
Accessibility
The WCAG guidelines require text alternatives for audio content. Without a transcript, a podcast is entirely inaccessible to people who are deaf or hard of hearing. This is not just a best practice. In many jurisdictions, including the US under the ADA and in the EU under the European Accessibility Act (which takes full effect in 2025), accessibility compliance is a legal requirement for public-facing content.
Searchability
Audio is not searchable. You cannot Ctrl+F a podcast episode. A transcript makes every word in an episode instantly findable. This changes how people interact with the content. Instead of scrubbing through a 90-minute episode to find one quote, you can search the text.
SEO
Search engines index text. They do not index audio. A transcript gives a podcast episode searchable, indexable content. According to a study by Podchaser cited by the BBC, pages with full transcripts can see significant increases in organic traffic because they contain the keywords people search for.
Content repurposing
A single episode transcript can become show notes, a blog post, social media quotes, an email newsletter, or research references. Without a transcript, you are working from memory or re-listening to the audio.
How to get a podcast transcript
Use an AI transcription tool
The fastest way. Paste a podcast URL into a service like Podtyper, and get a full transcript with speaker labels, timestamps, and AI-generated summary in under 3 minutes. Export as TXT, PDF, SRT, or VTT.
Download from the platform
If the podcast is on Apple Podcasts or YouTube, you may find a built-in transcript. Check the platform settings. On Apple Podcasts (iOS 17.4+), tap the transcript icon below the episode player. On YouTube, click the three-dot menu and select "Open Transcript."
These are free but harder to export and often less accurate than a dedicated tool.
Hire a transcriptionist
For near-perfect accuracy, hire someone. Services like Rev and Scribie charge per minute of audio. Expect a 24 to 48 hour turnaround and around $1 to $2 per minute.
Frequently Asked Questions
Is a podcast transcript the same as closed captions?
No. A transcript is a standalone text document. Closed captions are timed text overlays that appear alongside video or audio playback. A transcript can be converted to captions (by adding timestamps) but they are different formats for different contexts.
Can I use a podcast transcript for legal purposes?
It depends on the accuracy. AI-generated transcripts are not certified legal records. If you need a court-admissible transcript, you need a certified human transcriptionist. For reference, research, and internal use, AI transcripts are widely accepted.
Do podcasts have to provide transcripts?
Accessibility laws are moving in that direction. The European Accessibility Act requires many digital services to provide text alternatives for audio content starting in 2025. In the US, the DOJ has updated ADA regulations to include web accessibility standards. Providing transcripts is the most straightforward way to comply.
What is the difference between a transcript and show notes?
A transcript contains every word spoken in an episode. Show notes summarize the episode, list links, and provide context. They are different things. WCAG specifically requires a full text alternative, not a summary.
How accurate are podcast transcripts?
AI transcription tools like Deepgram Nova-3 achieve over 90% word accuracy on clean audio, according to Deepgram's published benchmarks. Accuracy varies with background noise, overlapping speech, and accent diversity. For most use cases (search, show notes, quotes, accessibility), this accuracy is sufficient.