Name: ai-voice-cloning
Availability: InStock
Author: inference-sh
AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion/non-verbal steering), Inworld TTS…
SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

AI Voice Generation

Generate natural AI voices via inference.sh CLI.

Quick Start

Requires inference.sh CLI (belt). Install instructions

belt login

Generate speech

belt app run infsh/kokoro-tts --input '{
"prompt": "Hello! This is an AI-generated voice that sounds natural and engaging.",
"voice": "af_sarah"
}'

## Available Models

| Model | App ID | Best For |
|-------|--------|----------|
| **Inworld TTS-2** | `inworld/text-to-speech-2` | **100+ languages, emotion/non-verbal steering, delivery modes** |
| Inworld TTS 1.5 Max | `inworld/text-to-speech-1-5-max` | Low latency (<200ms), 15 languages |
| Inworld TTS 1.5 Mini | `inworld/text-to-speech-1-5-mini` | Ultra-low latency (~120ms), 15 languages, real-time |
| ElevenLabs TTS | `elevenlabs/tts` | Premium quality, 22+ voices, 32 languages |
| ElevenLabs Voice Changer | `elevenlabs/voice-changer` | Transform existing voice recordings |
| Kokoro TTS | `infsh/kokoro-tts` | Natural, multiple voices |
| DIA | `infsh/dia-tts` | Conversational, expressive |
| Chatterbox | `infsh/chatterbox` | Casual, entertainment |
| Higgs | `infsh/higgs-tts` | Professional narration |
| VibeVoice | `infsh/vibevoice` | Emotional range |

## Kokoro Voice Library

### American English

| Voice ID | Gender | Style |
|----------|--------|-------|
| `af_sarah` | Female | Warm, friendly |
| `af_nicole` | Female | Professional |
| `af_sky` | Female | Youthful |
| `am_michael` | Male | Authoritative |
| `am_adam` | Male | Conversational |
| `am_echo` | Male | Clear, neutral |

### British English

| Voice ID | Gender | Style |
|----------|--------|-------|
| `bf_emma` | Female | Refined |
| `bf_isabella` | Female | Warm |
| `bm_george` | Male | Classic |
| `bm_lewis` | Male | Modern |

## Inworld TTS — Character & Emotion Voices

Inworld TTS-2 is purpose-built for character voices, gaming, and expressive speech. Use `[brackets]` inline for emotion, non-verbals, and delivery control:

```bash
# Expressive character voice with emotion steering
belt app run inworld/text-to-speech-2 --input '{
  "text": "[excited] Oh wow, you actually found the ancient artifact! [gasp] I cannot believe it... [whisper] We need to keep this between us.",
  "voice_id": "Sarah",
  "delivery_mode": "CREATIVE"
}'

# Calm narrator with stable delivery
belt app run inworld/text-to-speech-2 --input '{
  "text": "The sun set behind the mountains, casting long shadows across the valley. A new chapter was about to begin.",
  "voice_id": "Sarah",
  "delivery_mode": "STABLE"
}'

Delivery modes: STABLE (consistent, narration), BALANCED (natural, default), CREATIVE (expressive, characters)

Steering examples: [laugh], [sigh], [whisper], [excited], [sad], [angry], [pause], [gasp]

Built-in voices (271+ across 15 languages): Sarah, Alex, Ashley, Dennis, Hana, Blake, Luna, Clive, and many more. Browse all at the Inworld TTS Playground.

Low-Latency for Real-Time / Conversational AI

# Ultra-fast response for chatbots & game NPCs (~120ms)
belt app run inworld/text-to-speech-1-5-mini --input '{
  "text": "Welcome, traveler. What brings you to our village?",
  "voice_id": "Clive",
  "speaking_rate": 0.9
}'

Voice Generation Examples

Professional Narration

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to our quarterly earnings call. Today we will discuss the financial performance and strategic initiatives for the past quarter.",
  "voice": "am_michael",
  "speed": 1.0
}'

Conversational Style

belt app run infsh/dia-tts --input '{
  "text": "Hey, so I was thinking about that project we discussed. What if we tried a different approach?",
  "voice": "conversational"
}'

Audiobook Narration

belt app run infsh/kokoro-tts --input '{
  "prompt": "Chapter One. The morning mist hung low over the valley as Sarah made her way down the winding path. She had been walking for hours.",
  "voice": "bf_emma",
  "speed": 0.9
}'

Video Voiceover

belt app run infsh/kokoro-tts --input '{
  "prompt": "Introducing the next generation of productivity. Work smarter, not harder.",
  "voice": "af_nicole",
  "speed": 1.1
}'

Podcast Host

belt app run infsh/kokoro-tts --input '{
  "prompt": "Welcome back to Tech Talk! Im your host, and today we are diving deep into the world of artificial intelligence.",
  "voice": "am_adam"
}'

Multi-Voice Conversation

# Generate dialogue between two speakers
# Speaker 1
belt app run infsh/kokoro-tts --input '{
  "prompt": "Have you seen the latest AI developments? Its incredible how fast things are moving.",
  "voice": "am_michael"
}' > speaker1.json

# Speaker 2
belt app run infsh/kokoro-tts --input '{
  "prompt": "I know, right? Just last week I tried that new image generator and was blown away.",
  "voice": "af_sarah"
}' > speaker2.json

# Merge conversation
belt app run infsh/media-merger --input '{
  "audio_files": ["<speaker1-url>", "<speaker2-url>"],
  "crossfade_ms": 300
}'

Long-Form Content

Chunked Processing

For content over 5000 characters, split into chunks:

# Process long text in chunks
TEXT="Your very long text here..."

# Split and generate
# Chunk 1
belt app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-1>",
  "voice": "bf_emma"
}' > chunk1.json

# Chunk 2
belt app run infsh/kokoro-tts --input '{
  "prompt": "<chunk-2>",
  "voice": "bf_emma"
}' > chunk2.json

# Merge chunks
belt app run infsh/media-merger --input '{
  "audio_files": ["<chunk1-url>", "<chunk2-url>"],
  "crossfade_ms": 100
}'

Voice + Video Workflow

Add Voiceover to Video

# 1. Generate voiceover
belt app run infsh/kokoro-tts --input '{
  "prompt": "This stunning footage shows the beauty of nature in its purest form.",
  "voice": "am_michael"
}' > voiceover.json

# 2. Merge with video
belt app run infsh/media-merger --input '{
  "video_url": "https://your-video.mp4",
  "audio_url": "<voiceover-url>"
}'

Create Talking Head

# 1. Generate speech
belt app run infsh/kokoro-tts --input '{
  "prompt": "Hi, Im excited to share some updates with you today.",
  "voice": "af_sarah"
}' > speech.json

# 2. Animate with avatar
belt app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "<speech-url>"
}'

Speed and Pacing

Speed
Effect
Use For

0.8
Slow, deliberate
Audiobooks, meditation

0.9
Slightly slow
Education, tutorials

1.0
Normal
General purpose

1.1
Slightly fast
Commercials, energy

1.2
Fast
Quick announcements

# Slow narration
belt app run infsh/kokoro-tts --input '{
  "prompt": "Take a deep breath. Let yourself relax.",
  "voice": "bf_emma",
  "speed": 0.8
}'

Punctuation for Pacing

Use punctuation to control speech rhythm:

Punctuation
Effect

Period .
Full pause

Comma ,
Brief pause

...
Extended pause

!
Emphasis

?
Question intonation

-
Quick break

belt app run infsh/kokoro-tts --input '{
  "prompt": "Wait... Did you hear that? Something is coming. Something big!",
  "voice": "am_adam"
}'

Best Practices

Match voice to content - Professional voice for business, casual for social

Use punctuation - Control pacing with periods and commas

Keep sentences short - Easier to generate and sounds more natural

Test different voices - Same text sounds different across voices

Adjust speed - Slightly slower often sounds more natural

Break long content - Process in chunks for consistency

Use Cases

Voiceovers - Video narration, commercials

Audiobooks - Full book narration

Podcasts - AI hosts and guests

E-learning - Course narration

Accessibility - Screen reader content

IVR - Phone system messages

Content localization - Translate and voice

Related Skills

# ElevenLabs TTS (premium, 22+ voices)
npx skills add inference-sh/skills@elevenlabs-tts

# ElevenLabs voice changer (transform recordings)
npx skills add inference-sh/skills@elevenlabs-voice-changer

# All TTS models
npx skills add inference-sh/skills@text-to-speech

# Podcast creation
npx skills add inference-sh/skills@ai-podcast-creation

# AI avatars
npx skills add inference-sh/skills@ai-avatar-video

# Video generation
npx skills add inference-sh/skills@ai-video-generation

# Full platform skill
npx skills add inference-sh/skills@infsh-cli

Browse audio apps: belt app store --category audio
28
ai-voice-cloning

SKILL.md

related skills