Elevenlabs Tts

Set up, test, benchmark, and use ElevenLabs text-to-speech as an independent TTS skill, including HTTP streaming, WebSocket streaming guidance, voice listing...

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-07-11

xiaog-elevenlabs-tts covers independent elevenlabs voice synthesis setup with http/websocket streaming, voice selection, model tuning, and secure credential handling via environment variables.

structure

8.0

trigger phrases

6.0

procedure

8.0

edge cases

6.0

documentation

7.0

strengths

SKILL.md

---
name: elevenlabs-tts
description: Set up, test, benchmark, and use ElevenLabs text-to-speech as an independent TTS skill, including HTTP streaming, WebSocket streaming guidance, voice listing, model/output-format selection, latency tuning, and safe ELEVENLABS_API_KEY handling.
---

# ElevenLabs TTS

Use this skill for ELE / ElevenLabs voice generation, low-latency streaming TTS, voice IDs, voice listing, output formats, or TTS benchmark work.

## Non-negotiables

- Keep this skill independent from Xiaomi MiMo TTS. Do not mix API keys, endpoint shapes, voice IDs, or scripts.
- Treat `ELEVENLABS_API_KEY` as a secret. Never echo it, write it into skill files, commit it, or expose it in logs/screenshots.
- Prefer environment variable injection for tests: `ELEVENLABS_API_KEY=... python3 ...`.
- If the key is missing, verify scripts and docs locally, then report that live API synthesis is blocked by missing credentials.

## Standard workflow

1. Verify API key availability without printing it.
2. List voices with `scripts/list_voices.py`; choose a requested voice by ID/name or use the first available voice for smoke tests.
3. Generate a short HTTP streaming sample with `scripts/tts_stream_http.py`.
4. Capture evidence: output path, bytes, chunks, `ttfb_seconds`, elapsed time, model, output format.
5. If audio validation tools exist (`ffprobe`, `afinfo`, or `file`), inspect the generated file before claiming success.

## Recommended defaults

- Model: `eleven_flash_v2_5` for lowest latency; `eleven_turbo_v2_5` for higher quality with low latency; `eleven_multilingual_v2` for broad multilingual quality.
- Endpoint: `POST /v1/text-to-speech/{voice_id}/stream` when the full text is available.
- Output: `mp3_44100_128` for shareable files; `pcm_16000`/`pcm_24000` for realtime playback pipelines.
- Latency: `optimize_streaming_latency=2` or `3`; avoid `4` unless the user accepts possible number/date mispronunciation.
- Voice settings: keep `use_speaker_boost=false` for low latency.

## Quick commands

List voices:

```bash
ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" python3 skills/elevenlabs-tts/scripts/list_voices.py
```

HTTP streaming TTS benchmark:

```bash
ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" python3 skills/elevenlabs-tts/scripts/tts_stream_http.py \
  --text '野哥，ElevenLabs 流式语音测试成功。' \
  --voice-name Rachel \
  --model eleven_flash_v2_5 \
  --output-format mp3_44100_128 \
  --optimize-streaming-latency 2 \
  --out /tmp/elevenlabs-skill-test.mp3
```

If `--voice-name` is omitted, the script uses the first voice returned by `/v1/voices`.

## When to read more

- API contract, request/response details, WebSocket notes, and troubleshooting: `references/elevenlabs-api.md`.
- Deterministic scripts:
  - `scripts/list_voices.py`
  - `scripts/tts_stream_http.py`

don't have the plugin yet? install it then click "run inline in claude" again.

Elevenlabs Tts

SKILL.md

related skills