小米 MiMo TTS

Text-to-speech using Xiaomi MiMo TTS API. Generates WAV audio files. Triggers when user says "send voice message", "voice reply", "read to me", "use clip voi...

installs

stars

karma

SkillRank score ↗

5.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-01

mimo-tts wraps xiaomi's text-to-speech api with support for chinese voices, emotional styles (happy, angry, whisper), speed control, and fine-grained audio tags. outputs wav files from natural language triggers like 'send voice message' or 'read to me'.

structure

4.0

trigger phrases

7.0

procedure

6.0

edge cases

2.0

documentation

5.0

strengths

SKILL.md

---
name: mimo-tts
description: Text-to-speech using Xiaomi MiMo TTS API. Generates WAV audio files. Triggers when user says "send voice message", "voice reply", "read to me", "use clip voice", or any TTS-related request. Supports style control and fine-grained audio tags.
---

# Xiaomi MiMo TTS

## Quick Usage

Just say "send voice" + what you want me to say, or describe the voice style you want.

## Default Config

- **Default Voice**: `default_zh` (Chinese female)
- **Default Style**: `<style>夹子音</style>` (cute/clip voice, used when no style specified)

## Available Voices

| Voice Name | voice parameter |
|------------|-----------------|
| MiMo-Default | `mimo_default` |
| MiMo-Chinese-Female | `default_zh` |
| MiMo-English-Female | `default_eh` |

## Style Control

### Overall Style (at the beginning of text)

| Style Type | Examples |
|------------|----------|
| Speed Control | 变快 (faster) / 变慢 (slower) |
| Emotion | 开心 (happy) / 悲伤 (sad) / 生气 (angry) |
| Character | 孙悟空 (Wukong) / 林黛玉 (Lin Daiyu) |
| Style Variations | 悄悄话 (whisper) / 夹子音 (clip voice) / 台湾腔 (Taiwanese accent) |
| Dialect | 东北话 (Northeast) / 四川话 (Sichuan) / 河南话 (Henan) / 粤语 (Cantonese) |

**Format**: `<style>style1 style2</style>text to synthesize`

### Audio Tags (Fine-grained Control)

Use () to annotate emotion, speed, pauses, breathing, etc:

| Tag | Description | Example |
|-----|-------------|---------|
| （紧张，深呼吸） | Multi-emotion combo | （紧张，深呼吸）呼……冷静，冷静 |
| （语速加快） | Speed change | （语速加快，碎碎念） |
| （小声） | Volume control | （小声）哎呀，领带歪没歪？ |
| （长叹一口气） | Sigh | （长叹一口气） |
| （咳嗽） | Cough | （咳嗽）简直能把人骨头冻透了 |
| （沉默片刻） | Pause | （沉默片刻） |
| （苦笑） | Bitter smile | （苦笑）呵，没如果了 |
| （提高音量喊话） | Loud shout | （提高音量喊话）大姐！这鱼新鲜着呢！ |
| （极其疲惫，有气无力） | Exhausted | 师傅……到地方了叫我一声…… |
| （寒冷导致的急促呼吸） | Environmental | 呼——呼——这、这大兴安岭的雪…… |

**Synthesis Example**:

```python
import os
import base64
from openai import OpenAI

client = OpenAI(
    api_key=os.environ.get("MIMO_API_KEY"),
    base_url="https://api.xiaomimimo.com/v1"
)

# Clip voice style
text = "<style>夹子音</style>主人～我来啦！今天有什么需要帮忙的吗～"

completion = client.chat.completions.create(
    model="mimo-v2-tts",
    messages=[
        {"role": "user", "content": "你好"},
        {"role": "assistant", "content": text}
    ],
    audio={"format": "wav", "voice": "default_zh"}
)

audio_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("output.wav", "wb") as f:
    f.write(audio_bytes)
```

## Notes

- Target text must be in the `assistant` role message, not in `user`
- `<style>` tag must be at the beginning of target text
- For singing: `<style>唱歌</style>target text`
- Returns base64-encoded WAV audio

## Script Usage

Use `scripts/mimo_tts.py` for speech synthesis:

```bash
MIMO_API_KEY=your_api_key python3 scripts/mimo_tts.py "text to synthesize" --voice default_zh --style "夹子音" --output output.wav
```

**Note**: Set `MIMO_API_KEY` environment variable or configure in OpenClaw settings.

don't have the plugin yet? install it then click "run inline in claude" again.

小米 MiMo TTS

SKILL.md

related skills