Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in mu...
---
name: elevenlabs-speech
description: Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
---
# ElevenLabs Speech
Complete voice solution — both TTS and STT using one API:
- **TTS**: Text-to-Speech (high-quality voices)
- **STT**: Speech-to-Text via Scribe (accurate transcription)
## Quick Start
### Environment Setup
Set your API key:
```bash
export ELEVENLABS_API_KEY="sk_..."
```
Or create `.env` file in workspace root.
### Text-to-Speech (TTS)
Convert text to natural-sounding speech:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```
With custom voice:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```
### List Available Voices
```bash
python scripts/elevenlabs_speech.py voices
```
## Using in Code
```python
from scripts.elevenlabs_speech import ElevenLabsClient
client = ElevenLabsClient(api_key="sk_...")
# Basic TTS
result = client.text_to_speech(
text="Hello from zerox",
output_path="greeting.mp3"
)
# With custom settings
result = client.text_to_speech(
text="Your text here",
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel
stability=0.5,
similarity_boost=0.75,
output_path="output.mp3"
)
# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
print(f"{voice['name']}: {voice['voice_id']}")
```
## Popular Voices
| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) |
| `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing |
| `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm |
| `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative |
## Voice Settings
- **stability** (0-1): Lower = more emotional, Higher = more stable
- **similarity_boost** (0-1): Higher = closer to original voice
Default: stability=0.5, similarity_boost=0.75
## Models
- `eleven_turbo_v2_5` - Fast, high quality (default)
- `eleven_multilingual_v2` - Best for non-English
- `eleven_monolingual_v1` - English only
## Integration with Telegram
When user sends text and wants voice reply:
```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")
# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```
## Pricing
Check https://elevenlabs.io/pricing for current rates. Free tier available!
## Speech-to-Text (STT) with ElevenLabs Scribe
Transcribe voice messages using ElevenLabs Scribe:
### Transcribe Audio
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```
With specific language:
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```
With speaker diarization (multiple speakers):
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```
### Using in Code
```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
client = ElevenLabsScribe(api_key="sk-...")
# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])
# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")
# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```
### Supported Formats
- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (`.ogg`)
### Language Support
Scribe supports 99 languages including:
- Arabic (`ara`)
- English (`eng`)
- Spanish (`spa`)
- French (`fra`)
- And many more...
Without language hint, it auto-detects.
## Complete Workflow Example
**User sends voice message → You reply with voice:**
```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient
# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']
# 2. Process/understand the text
# ... your logic here ...
# 3. Generate response text
response_text = "Your response here"
# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")
# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
```
## Pricing
Check https://elevenlabs.io/pricing for current rates:
**TTS (Text-to-Speech):**
- Free tier: 10,000 characters/month
- Paid plans available
**STT (Speech-to-Text) - Scribe:**
- Free tier available
- Check website for current pricing
don't have the plugin yet? install it then click "run inline in claude" again.