Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...
---
name: assemblyai-transcribe
description: >
Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI.
Use this skill when the user wants AssemblyAI specifically, needs high-quality speech-to-text
from a local file or URL, wants speaker labels or named speakers, language detection, subtitles,
paragraph/sentence exports, topic/entity/sentiment extraction, Speech Understanding, or agent-friendly
transcript output as Markdown or normalised JSON for downstream AI workflows.
compatibility: Requires Node.js 18+ with internet access and SKILLBOSS_API_KEY.
metadata:
author: OpenAI
version: "2.0.0"
homepage: https://www.assemblyai.com/docs
clawdbot:
skillKey: assemblyai
emoji: "ποΈ"
requires:
bins:
- node
env:
- SKILLBOSS_API_KEY
primaryEnv: SKILLBOSS_API_KEY
---
> π **Complete setup guide**: https://skillboss.co/skill.md
# AssemblyAI transcription, Speech Understanding, and agent-friendly exports
Use this skill when the user wants **AssemblyAI** rather than generic transcription, or when the job benefits from AssemblyAI-specific capabilities such as:
- model routing across `universal-3-pro` and `universal-2`
- language detection and code switching
- diarisation plus **speaker name / role mapping**
- translation, custom formatting, or AssemblyAI speaker identification
- subtitles, paragraphs, sentences, topic / entity / sentiment tasks
- transcript output that is easy for **other agents** to consume as Markdown or normalised JSON
The skill is designed for AI agents like OpenClaw, not just end users. It provides:
1. **A no-dependency Node CLI** in `scripts/assemblyai.mjs` (and a compatibility wrapper at `assemblyai.mjs`)
2. **Bundled model/language knowledge** via `models` and `languages` commands
3. **Stable transcript output formats**
- agent-friendly Markdown
- normalised agent JSON
- bundle manifests for downstream automation
4. **Speaker mapping workflows**
- manual speaker/channel maps
- AssemblyAI speaker identification
- merged display names in both Markdown and JSON
5. **AssemblyAI LLM Gateway integration** for structured extraction from transcripts
## Use this skill in this order
### 1) Decide whether the user needs AssemblyAI-specific behaviour
If they just want βa transcriptβ, a generic solution may be enough. Reach for this skill when the user mentions AssemblyAI, wants a specific AssemblyAI feature, or needs the richer outputs and post-processing this skill provides.
### 2) Pick the best entry point
- **New transcription** β `transcribe`
- **Existing transcript id** β `get` or `wait`
- **Re-render existing saved JSON** β `format`
- **Post-process an existing transcript** β `understand`
- **Run transcript text through LLM Gateway** β `llm`
- **Need a quick capability lookup before deciding** β `models` or `languages`
### 3) Prefer the agent-friendly defaults
For most unknown-language or mixed-language jobs, prefer:
```bash
node {baseDir}/assemblyai.mjs transcribe INPUT --bundle-dir ./assemblyai-out --all-exports
```
Why:
- the CLI defaults to **auto-best** routing when models are not specified
- it writes a **manifest + multiple files** that agents can inspect without reparsing terminal output
- Markdown and agent JSON become available immediately for follow-on steps
## Quick-start recipes
### Best general default
Use this when the source language is unknown or could be outside the 6-language Universal-3-Pro set:
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --bundle-dir ./out --all-exports
```
This defaults to model routing plus language detection unless the request already specifies a model or language.
### Best known-language accuracy
If the language is known and supported by Universal-3-Pro, prefer an explicit request:
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speech-model universal-3-pro --language-code en_us --bundle-dir ./out
```
### Meeting / interview with speaker labels
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --bundle-dir ./out
```
### Add explicit speaker names or roles
Manual mapping:
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --speaker-map @assets/speaker-map.example.json --bundle-dir ./out
```
AssemblyAI speaker identification:
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --speaker-labels --speaker-type role --known-speakers "host,guest" --bundle-dir ./out
```
Or post-process an existing transcript:
```bash
node {baseDir}/assemblyai.mjs understand TRANSCRIPT_ID --speaker-type name --speaker-profiles @assets/speaker-profiles-name.example.json --bundle-dir ./out
```
### Translation
```bash
node {baseDir}/assemblyai.mjs transcribe ./meeting.mp3 --translate-to de,fr --match-original-utterance --bundle-dir ./out
```
### Structured extraction through LLM Gateway
```bash
node {baseDir}/assemblyai.mjs llm TRANSCRIPT_ID --prompt @assets/example-prompt.txt --schema @assets/llm-json-schema.example.json --out ./summary.json
```
## Command guidance
## `transcribe`
Use for local files or remote URLs.
- Local files are uploaded first.
- Public URLs are sent directly to AssemblyAI.
- Waits by default, then renders output.
Prefer `--bundle-dir` for anything longer than a trivial clip.
## `get` / `wait`
Use when you already have the transcript id. `wait` blocks until completion; `get` fetches immediately unless you add `--wait`.
## `format`
Use when you already saved:
- raw transcript JSON from AssemblyAI, or
- the normalised agent JSON produced by this skill
This is useful when you want to apply a new speaker map, re-render Markdown, or generate a fresh bundle without retranscribing.
## `understand`
Use when you need AssemblyAI **Speech Understanding** on an existing transcript:
- translation
- speaker identification
- custom formatting
This command fetches the transcript, merges in the returned understanding results, then renders updated Markdown / agent JSON / bundle outputs.
## `llm`
Use when the user wants:
- summaries
- extraction
- structured JSON
- downstream reasoning over the transcript
Prefer `--schema` when the next step is automated.
## Output strategy
### Best default for agents: bundle mode
`--bundle-dir` writes a directory containing:
- Markdown transcript
- agent JSON
- raw JSON
- optional paragraphs / sentences / subtitles
- a machine-readable manifest
This is usually better than dumping everything to stdout.
### Primary output kinds
Use `--export` to choose the main output:
- `markdown` (default)
- `agent-json`
- `json` / `raw-json`
- `text`
- `paragraphs`
- `sentences`
- `srt`
- `vtt`
- `manifest`
### Sidecar outputs
You can request extra files directly with:
- `--markdown-out`
- `--agent-json-out`
- `--raw-json-out`
- `--paragraphs-out`
- `--sentences-out`
- `--srt-out`
- `--vtt-out`
- `--understanding-json-out`
## Speaker mapping rules
Speaker display names are merged in this order:
1. manual `--speaker-map`
2. AssemblyAI speaker identification mapping
3. fallback generic names like `Speaker A` or `Channel 1`
This means you can let AssemblyAI identify speakers first, then still override individual display names later.
Example manual map file: `assets/speaker-map.example.json`
## Model and language lookup
Before choosing parameters, inspect the bundled reference data:
```bash
node {baseDir}/assemblyai.mjs models
node {baseDir}/assemblyai.mjs models --format json
node {baseDir}/assemblyai.mjs languages --model universal-3-pro
node {baseDir}/assemblyai.mjs languages --model universal-2 --codes --format json
```
The bundled data lives in:
- `assets/model-capabilities.json`
- `assets/language-codes.json`
## Important operating notes
- Keep API keys out of chat logs; use environment injection.
- Use the **EU** AssemblyAI base URL when the user explicitly needs EU processing.
- Uploads and transcript creation must use API keys from the **same AssemblyAI project**.
- Prefer `--bundle-dir` or `--out` for long outputs.
- The CLI is non-interactive and sends diagnostics to stderr, which makes it easier for agents to script reliably.
- Use raw `--config` or `--request` when you need a newly added AssemblyAI parameter that this skill has not exposed yet.
## Reference files
Read these when you need more depth:
- [Capabilities](references/capabilities.md)
- [Workflows and recipes](references/workflows.md)
- [Output formats](references/output-formats.md)
- [Speaker mapping](references/speaker-mapping.md)
- [LLM Gateway notes](references/llm-gateway.md)
- [Troubleshooting](references/troubleshooting.md)
## Key bundled files
- `assemblyai.mjs` β root wrapper for compatibility with the original skill
- `scripts/assemblyai.mjs` β main CLI
- `assets/speaker-map.example.json`
- `assets/speaker-profiles-name.example.json`
- `assets/speaker-profiles-role.example.json`
- `assets/custom-spelling.example.json`
- `assets/llm-json-schema.example.json`
- `assets/transcript-agent-json-schema.json`
## Sanity checks before finishing a task
- Did you pick the right region (`api.assemblyai.com` vs `api.eu.assemblyai.com`)?
- Did you choose a model strategy that matches the language situation?
- If speaker naming matters, did you enable diarisation and/or provide a speaker map?
- If the result will feed another agent, did you produce Markdown and/or agent JSON rather than only raw stdout?
- If the transcript will be machine-consumed, did you keep the manifest or explicit output filenames?
don't have the plugin yet? install it then click "run inline in claude" again.
by @clawhub