Pre Recorded Transcription

Transcribe pre-recorded audio files or URLs with Gladia. Use when the user needs batch/async transcription, speaker diarization, subtitles (SRT/VTT), PII red...

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-15

pre-recorded-transcription wraps gladia's async api for batch audio/video transcription with diarization, pii redaction, subtitles, and translation. sdk-first approach with fallback raw rest documented.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

7.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: pre-recorded-transcription
description: Transcribe pre-recorded audio files or URLs with Gladia. Use when the user needs batch/async transcription, speaker diarization, subtitles (SRT/VTT), PII redaction, translation, NER, summarization, chapterization, audio-to-LLM, or any audio intelligence on pre-recorded content. Always prefer the official SDK; fall back to raw REST only when SDK cannot satisfy the requirement.
license: MIT
---

# Pre-Recorded Transcription

Gladia's pre-recorded API transcribes audio and video files asynchronously.

> **SDK-first**: always use the official SDK — see [sdk-integration](../sdk-integration/SKILL.md) for policy, setup, and fallback criteria.

## When to Use

- Existing audio/video files or URLs (including social/video links)
- Batch or asynchronous transcription workflows
- Pre-recorded-only features: diarization, PII redaction, subtitles

**When NOT to use:** If the user needs real-time / live transcription of a stream, microphone, or ongoing audio feed, use the [live-transcription skill](../live-transcription/SKILL.md) instead. Live transcription uses WebSocket sessions, not the pre-recorded API.

## References

Consult these resources as needed:

- ./references/transcription-options.md -- Full options (JS + Python)
- ./references/managing-jobs.md -- `get`, `list`, `getFile`, `delete`
- ./references/delivery-and-response.md -- Response shape and events
- ../audio-intelligence/SKILL.md -- Feature availability and config
- ../sdk-integration/SKILL.md -- Setup, config, SDK vs raw API
- ../sdk-integration/references/sdk-versions.md -- Current SDK versions
- ../troubleshooting/SKILL.md -- Errors and diagnostics

## API Endpoints (reference — prefer SDK methods instead)

| Endpoint                    | Method | SDK equivalent                          |
| --------------------------- | ------ | --------------------------------------- |
| `/v2/upload`                | POST   | `transcribe()` auto-uploads local files |
| `/v2/pre-recorded`          | POST   | `create()` / `transcribe()`             |
| `/v2/pre-recorded`          | GET    | `list()`                                |
| `/v2/pre-recorded/:id`      | GET    | `get()` / `poll()` / `transcribe()`     |
| `/v2/pre-recorded/:id`      | DELETE | `delete()`                              |
| `/v2/pre-recorded/:id/file` | GET    | `getFile()`                             |

## Workflow

### Recommended (SDK)

The SDK `transcribe()` method handles upload, job creation, and polling in one call. Use this by default.

```typescript
const result = await client.preRecorded().transcribe("./audio.mp3", {
  language_config: { languages: ["en"] },
  diarization: true,
});

console.log(result.result?.transcription?.full_transcript);
```

```python
result = client.prerecorded().transcribe(
    "audio.mp3",
    {"language_config": {"languages": ["en"]}, "diarization": True},
)

print(result.result.transcription.full_transcript)
```

Audio input can be a local file path, HTTP(S) URL, social/video URL, or binary file object. For full input types, see [sdk-integration](../sdk-integration/SKILL.md#audio-input-types).

### Fallback (raw REST — only when SDK is not feasible)

Use raw REST only when SDK use is not possible.

1. **Upload** (if local file): `POST /v2/upload` with multipart form data → get `audio_url`
2. **Create job**: `POST /v2/pre-recorded` with `audio_url` and config → get `id`
3. **Poll**: `GET /v2/pre-recorded/:id` until `status: "done"` (or use webhooks/callbacks)
4. **Parse results**: Extract `transcription`, `diarization`, `translation`, etc. from response

## Managing Jobs

Use SDK methods for post-processing operations:

- JavaScript: `client.preRecorded().get(id)`, `.list(filters)`, `.getFile(id)`, `.delete(id)`
- Python: `client.prerecorded().get(id)`, `.list(filters)`, `.get_file(id)`, `.delete(id)`

For full JS/Python examples, pagination filters, and REST equivalents, see [./references/managing-jobs.md](./references/managing-jobs.md).

## Transcription Options

All options are passed as the second argument to `transcribe()`. Key options:

| Option            | Description                                 |
| ----------------- | ------------------------------------------- |
| `language_config` | Expected languages, code switching          |
| `diarization`     | Speaker identification (pre-recorded only)  |
| `translation`     | Translate to target languages               |
| `summarization`   | Generate bullet points or paragraph summary |
| `subtitles`       | Generate SRT/VTT files                      |
| `pii_redaction`   | Redact PII (pre-recorded only)              |
| `audio_to_llm`    | Run custom LLM prompts on transcript        |
| `callback_url`    | Async webhook delivery                      |

For full option details, see [./references/transcription-options.md](./references/transcription-options.md). For audio intelligence config, see [audio-intelligence](../audio-intelligence/SKILL.md). For client-level retry/timeouts, see [sdk-integration](../sdk-integration/SKILL.md#configuration-options).

## Response and Delivery

For full response JSON and event names, see [./references/delivery-and-response.md](./references/delivery-and-response.md).

## Limits and Specifications

| Constraint              | Value                             |
| ----------------------- | --------------------------------- |
| Max file size           | 1000 MB                           |
| Max duration            | 135 minutes (120 min for YouTube) |
| Enterprise max duration | 4h15                              |
| Concurrency (paid)      | 25 concurrent jobs                |
| Concurrency (free)      | 3 concurrent jobs                 |

## Polling Best Practices

The SDK handles polling automatically — `transcribe()` polls until the job completes with configurable `interval` and `timeout`:

```typescript
const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000, // Poll every 5s
  timeout: 600000, // Timeout after 10 minutes
});
```

If using raw REST instead of the SDK:

- Use webhooks or callbacks instead of polling when possible
- If polling, implement exponential backoff (start at 3s, max 30s)

## Common Mistakes

- **Code switching without language list**: enabling `code_switching: true` with empty `languages` triggers 100+ language evaluation. Always provide 3-5 expected languages.
- **Polling without backoff**: rapid polling wastes requests and may trigger 429s. The SDK handles this; for raw REST, use webhooks or exponential backoff.
- **Expecting live-only features**: diarization, PII redaction, and subtitles are pre-recorded only — not available in live mode.
- **Wrong audio file path**: the audio download endpoint is `/v2/pre-recorded/:id/file`, not `/v2/pre-recorded/:id/audio`.

For the full list of gotchas and diagnostics, see the [troubleshooting skill](../troubleshooting/SKILL.md).

## Further Reading

- [Pre-recorded quickstart](https://docs.gladia.io/chapters/pre-recorded-stt/quickstart)
- [Audio intelligence overview](https://docs.gladia.io/chapters/pre-recorded-stt/audio-intelligence)
- [API reference: init](https://docs.gladia.io/api-reference/v2/pre-recorded/init)

don't have the plugin yet? install it then click "run inline in claude" again.

extracted implicit sdk/rest workflow into explicit procedure steps, formalized decision points for file types and features, added input contract with api key and source types, structured output contract with json schema and limits, added outcome signals for success validation, and clarified common gotchas with concrete if-else branching.

Pre-Recorded Transcription

intent

transcribe pre-recorded audio and video files asynchronously using gladia's api. use this skill when you have existing audio/video files or urls (including social media links) and need batch transcription with features like speaker diarization, pii redaction, subtitles, translation, summarization, or audio intelligence. do not use this for live/real-time transcription of streams or microphone input; use the live-transcription skill instead.

inputs

gladia api credentials:

GLADIA_API_KEY: your gladia api key (required). set as environment variable or pass to sdk client config.
endpoint: https://api.gladia.io/v2 (default, included in sdk).

audio/video input source (one of):

local file path (string): relative or absolute path to audio/video file on disk.
http(s) url: publicly accessible audio/video url.
social/video url: youtube, tiktok, instagram, facebook, twitter, twitch links (gladia auto-downloads).
binary file object: file-like object or buffer (js/python sdk).

sdk and client setup:

gladia javascript sdk: @gladia-io/gladia-node-sdk (npm).
gladia python sdk: gladia-sdk (pip).
see ../sdk-integration/SKILL.md for installation, auth setup, retry config, and timeout settings.

optional: webhook for async delivery:

callback_url: https webhook endpoint to receive job completion events (instead of polling). requires public https endpoint.

procedure

step 1: initialize gladia client

input: gladia api key (env var or explicit config). output: authenticated client object ready for transcription.

use the official sdk (javascript or python). pass your api key via environment variable (GLADIA_API_KEY) or explicit config. see ../sdk-integration/SKILL.md for full setup.

import Gladia from "@gladia-io/gladia-node-sdk";
const client = new Gladia();

from gladia import Gladia
client = Gladia()

step 2: prepare transcription options

input: user requirements (language, diarization, subtitles, translation, etc.). output: options object (javascript or python dict).

build your transcription config. pass as the second argument to transcribe(). common options:

language_config: object with languages array (e.g., ["en", "es"]) and optional code_switching: true if speaker switches languages.
diarization: boolean, identifies speakers (pre-recorded only).
translation: object with target_languages array to translate transcript.
summarization: object with type ("bullet_points" or "paragraph").
subtitles: object with format ("srt" or "vtt").
pii_redaction: boolean, masks pii in transcript.
audio_to_llm: object with model_name and prompt for custom llm processing.

for full option list and audio intelligence config, see ./references/transcription-options.md and ../audio-intelligence/SKILL.md.

step 3: call transcribe() with audio input and options

input: audio source (file path, url, or buffer) and transcription options. output: transcription result object with status, transcript, and any requested features (diarization, subtitles, etc.).

the sdk's transcribe() method handles upload, job creation, and polling in one call. pass the audio source and options:

const result = await client.preRecorded().transcribe("./audio.mp3", {
  language_config: { languages: ["en"] },
  diarization: true,
});

result = client.prerecorded().transcribe(
    "audio.mp3",
    {"language_config": {"languages": ["en"]}, "diarization": True},
)

the transcribe() method:

uploads local files automatically to /v2/upload and gets an audio_url.
creates a job at /v2/pre-recorded with your options.
polls /v2/pre-recorded/:id until job completes (status: "done").
returns the full result object.

polling behavior: sdk polls with configurable interval (default 3s) and timeout (default 600s). customize polling via optional third argument:

const result = await client.preRecorded().transcribe(audio, options, {
  interval: 5000,  // poll every 5 seconds
  timeout: 600000, // timeout after 10 minutes
});

step 4: extract and use results

input: result object from transcribe(). output: transcript text, speaker labels (if diarization), subtitles (if requested), translations, etc.

access results from result.result object:

result.result.transcription.full_transcript: complete text transcript.
result.result.diarization: array of speaker segments with speaker_id and text.
result.result.subtitles: srt or vtt formatted subtitle string.
result.result.translation: translated transcripts keyed by language code.
result.result.summarization: summary text.
result.result.metadata: job id, duration, language detected, etc.

decision points

if user has local audio file: pass file path (string) to transcribe(). sdk auto-uploads to /v2/upload and extracts audio_url internally.

if user has remote url (http/https or social link): pass url (string) directly to transcribe(). gladia downloads and processes.

if user needs real-time/live transcription: do not use this skill. route to live-transcription skill instead, which uses websocket streaming.

if user needs diarization, pii redaction, or subtitles: use pre-recorded api (this skill). these features are pre-recorded only and not available in live mode.

if user prefers async delivery over polling: provide callback_url in options. gladia posts job completion event to your https webhook instead of blocking on poll. requires public https endpoint. polling is default if callback_url is omitted.

if sdk cannot satisfy requirement (edge case): fall back to raw rest api only as last resort. see ../sdk-integration/SKILL.md for fallback criteria. raw workflow: upload (if local), create job at /v2/pre-recorded, poll /v2/pre-recorded/:id, extract results.

if code switching enabled but no language list provided: gladia evaluates 100+ languages, causing slow processing and potential billing impact. always provide 3-5 expected languages when code_switching: true.

if job fails or times out: check error response. common causes: invalid audio url (404), unsupported format, file exceeds 1000 mb, duration exceeds limits. see ../troubleshooting/SKILL.md for diagnostics.

output contract

success response: result object with structure:

{
  "request_id": "job-uuid",
  "status": "done",
  "result": {
    "id": "job-uuid",
    "transcription": {
      "language": "en",
      "full_transcript": "the complete text here..."
    },
    "diarization": [
      {
        "speaker_id": 1,
        "start_time": 0.5,
        "end_time": 5.2,
        "transcript": "speaker 1 text..."
      }
    ],
    "subtitles": "1\n00:00:00,000 --> 00:00:05,000\nsubtitle text\n\n2\n00:00:05,000 --> 00:00:10,000\nnext subtitle\n",
    "translation": {
      "es": "transcripción completa aquí...",
      "fr": "transcription complète ici..."
    },
    "summarization": {
      "type": "bullet_points",
      "summary": "- key point 1\n- key point 2\n"
    },
    "metadata": {
      "duration_ms": 120500,
      "audio_url": "https://...",
      "created_at": "2024-01-15T10:30:00Z"
    }
  }
}

file outputs (if requested):

subtitles saved as .srt or .vtt file if subtitles option enabled.
transcript can be written to .txt file.
full result can be serialized to .json.

rate limits:

paid tier: 25 concurrent jobs.
free tier: 3 concurrent jobs.
queue queued jobs and poll status to respect concurrency.

file size and duration limits:

max file size: 1000 mb.
max duration: 135 minutes (120 min for youtube links).
enterprise tier: up to 4h15m duration.

outcome signal

the skill worked if:

result.status === "done" and result.result.transcription.full_transcript contains non-empty text.
transcript text matches expected content from the audio (spot check).
if diarization enabled: result.result.diarization array contains speaker segments with speaker_id, timestamps, and text.
if subtitles enabled: result.result.subtitles contains valid srt or vtt formatted string.
if translation enabled: result.result.translation object contains target language keys with translated text.
if pii redaction enabled: sensitive data (phone, ssn, email) is masked or removed from transcript.
if callback_url provided: webhook post received at callback endpoint with job completion event within 1-2 seconds of job completion.
if any feature fails: error response includes error.type and error.message describing the issue (see ../troubleshooting/SKILL.md).

Pre Recorded Transcription

related skills

Pre-Recorded Transcription

intent

inputs

procedure

step 1: initialize gladia client

step 2: prepare transcription options

step 3: call transcribe() with audio input and options

step 4: extract and use results

decision points

output contract

outcome signal