Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wro...
---
name: senseaudio-polyphone-tts
description: Fix Chinese polyphone (多音字) mispronunciation in TTS by auto-detecting ambiguous characters and applying pinyin annotations. Use when users complain about wrong pronunciation, need precise tone control, or are synthesizing text with characters like 行/干/量/好/了/得/地/的/着/过. Triggers on "读音不对", "这个字读错了", "多音字", "标注拼音", "银行行长", "绕口令", or any request to correct TTS pronunciation.
metadata:
openclaw:
requires:
env:
- SENSEAUDIO_API_KEY
bins:
- curl
- jq
- xxd
primaryEnv: SENSEAUDIO_API_KEY
homepage: https://senseaudio.cn
compatibility:
required_credentials:
- name: SENSEAUDIO_API_KEY
description: API key from https://senseaudio.cn/platform/api-key
env_var: SENSEAUDIO_API_KEY
---
# SenseAudio Polyphone TTS (多音字)
Precise pronunciation control for Chinese TTS via pinyin annotation. The `dictionary` parameter lets you override how specific characters are read — essential for polyphones (多音字) that the model might guess wrong.
> The `dictionary` parameter only works with **cloned voices** and **model `SenseAudio-TTS-1.5`**. System voices (male_0004_a etc.) do not support it.
## Step 1: Scan for Polyphones
When the user provides text, scan it for these common polyphones and flag any that appear:
| Character | Readings | Context clues |
|-----------|----------|---------------|
| 行 | háng (行业/银行/行列) / xíng (行走/行动/可行) | 银行、行长、行业 → háng |
| 干 | gān (干净/干燥) / gàn (干活/干部) | 干部、干活 → gàn |
| 量 | liáng (量体温/测量) / liàng (数量/重量) | 数量、质量 → liàng |
| 铺 | pū (铺床/铺路) / pù (店铺/铺子) | 店铺、铺面 → pù |
| 好 | hǎo (好的/很好) / hào (好奇/爱好) | 爱好、好学 → hào |
| 了 | le (吃了/来了) / liǎo (了解/了结) | 了解、了不起 → liǎo |
| 得 | de (跑得快) / dé (得到) / děi (得去) | 得到 → dé;必须 → děi |
| 地 | de (慢慢地) / dì (土地/地方) | 副词用法 → de |
| 的 | de (我的) / dí (的确) / dì (目的) | 目的、的确 → dì/dí |
| 着 | zhe (看着) / zháo (着火) / zhuó (着装) | 着火、着急 → zháo;着装 → zhuó |
| 长 | cháng (长度/很长) / zhǎng (成长/行长) | 行长、生长 → zhǎng |
| 重 | zhòng (重量/重要) / chóng (重复/重新) | 重复、重新 → chóng |
| 中 | zhōng (中间/中国) / zhòng (中奖/中毒) | 中奖、中毒 → zhòng |
| 还 | hái (还有/还是) / huán (还钱/归还) | 还钱、偿还 → huán |
| 发 | fā (发现/发展) / fà (头发/理发) | 头发、理发 → fà |
| 数 | shù (数字/数量) / shǔ (数数/数一数) | 数数、数落 → shǔ |
| 参 | cān (参加/参考) / shēn (人参/党参) | 人参、党参 → shēn |
| 差 | chā (差别/差距) / chà (差不多) / chāi (出差) | 出差 → chāi;差不多 → chà |
Show the user which polyphones were found and your best guess at the intended reading, then ask them to confirm or correct before synthesizing.
**Example:**
```
检测到多音字:
- "行" (第2个): 银行 → 建议读 háng [hang2] ✓ 还是 xíng [xing2]?
- "行" (第4个): 行长 → 建议读 zhǎng [zhang3] ✓ 还是 cháng [chang2]?
```
## Step 2: Build the Dictionary
Convert confirmed readings into the `dictionary` array. Each entry covers one phrase containing the polyphone:
```
原文片段 → replacement 格式:在多音字前加 [pinyin],其余字保持原样
```
**Pinyin format:** `[声母韵母声调数字]` — e.g., `[hang2]`、`[xing2]`、`[zhang3]`
**Example:**
- original: `银行行长`
- replacement: `银[hang2]行[zhang3]长`
Build the full dictionary array:
```json
"dictionary": [
{"original": "银行行长", "replacement": "银[hang2]行[zhang3]长"},
{"original": "好奇心", "replacement": "[hao4]奇心"}
]
```
Each `original` should be a short phrase (3–8 chars) that uniquely identifies the occurrence in context. Avoid single-character originals — they may match unintended occurrences.
## Step 3: Synthesize
The user must provide a cloned voice ID. If they don't have one, remind them that `dictionary` requires a cloned voice and suggest using the `senseaudio-voice-cloner` skill first.
```bash
curl -s -X POST https://api.senseaudio.cn/v1/t2a_v2 \
-H "Authorization: Bearer $SENSEAUDIO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "SenseAudio-TTS-1.5",
"text": "<TEXT>",
"stream": false,
"voice_setting": {
"voice_id": "<CLONED_VOICE_ID>"
},
"audio_setting": {
"format": "mp3"
},
"dictionary": <DICTIONARY_ARRAY>
}' -o response.json
jq -r '.data.audio' response.json | xxd -r -p > output.mp3
```
Check `base_resp.status_code == 0` before decoding.
## Step 4: Iterate
After the user listens, they may find additional mispronunciations. Update the `dictionary` array and re-synthesize. Keep the previous `response.json` until the new one succeeds.
Report: file path, duration (`jq '.extra_info.audio_length' response.json` ms), character count, and which dictionary entries were applied.
don't have the plugin yet? install it then click "run inline in claude" again.