Advanced music generation for OpenClaw, using the MiniMax Music 2.6 token plan. Use for cover and style transfer, two-song mashup, lyrics generation API, emo...
---
name: music-craft-minimax
version: 1.1.0
description: Advanced music generation for OpenClaw, using the MiniMax Music 2.6 token plan. Use for cover and style transfer, two-song mashup, lyrics generation API, emotion-driven prompt engineering, and fine control via the `mmx` CLI. Extends `music-craft` with MiniMax-specific features.
metadata: '{"openclaw":{"requires":{"env":["MINIMAX_API_KEY"],"bins":["python3","ffmpeg","yt-dlp","mmx"]},"primaryEnv":"MINIMAX_API_KEY","emoji":"\ud83c\udfb6","homepage":"https://github.com/LuisCharro/skills/tree/main/publish/music-craft-minimax","envVars":[{"name":"MINIMAX_API_KEY","required":true,"description":"API key for the MiniMax Music 2.6 token plan. Required for cover, mashup, lyrics generation, and mmx flag control."}]}}'
---
# Music Craft — MiniMax
This is the **power-user upgrade** of [`music-craft`](../music-craft/). It does everything that skill does, plus the features that require the MiniMax Music 2.6 token plan:
- **Cover and style transfer** from a reference audio file or YouTube URL (preserves melody)
- **Two-song mashup** (Song A's content and emotion + Song B's style)
- **Lyrics generation** via the MiniMax API endpoint (with edit mode for iteration)
- **Emotion analysis** on input audio to drive prompt construction (vocal speed, intensity curve, pitch bends)
- **Fine control** over generation parameters (BPM, key, structure, avoid list as separate flags via `mmx`)
For everything else (standard song generation, instrumentation, anti-sparse prompt engineering, structure tags, user preference flow), this skill uses the same workflow as `music-craft`. Read that skill first to understand the base, then come back here for the MiniMax-specific extensions.
## Data, Consent, and Local Side Effects
This skill is allowed to process music media, but it should not surprise the user:
- **Cloud generation:** prompts, lyrics, reference audio, URLs, and cover/mashup inputs may be sent to MiniMax through `mmx` or MiniMax API calls.
- **Optional VLM captioning:** `--vlm` is opt-in and may send images or sampled video frames to an external/cloud vision service through `mmx vision describe`.
- **URL downloads:** YouTube, JioSaavn, and mx3.ch helpers download only URLs the user provided for the music task. They do not install dependencies automatically.
- **Local files:** analysis JSON, lyrics, prompts, temporary media, and generated audio are written to user-selected paths or temporary directories.
- **Overwrites:** helper scripts should refuse to replace existing user-visible outputs unless the operator passes an explicit `--overwrite` flag.
Before sending user-owned or third-party media to cloud services, state what will be uploaded and why, then wait for confirmation if the user has not already clearly requested that cloud workflow.
## Required MiniMax Workflow
> **Operator rules for MiniMax generations:**
> - Run MiniMax generations sequentially, not in parallel.
> - Do not assume the CLI will honor a requested output path — verify the file exists after each run.
> - Treat requested duration as a target, not a guarantee, especially for lyric-heavy prompts.
> - Before running multi-output MiniMax generations, load [`references/minimax-generation-caveats.md`](references/minimax-generation-caveats.md).
For every cover, style-transfer, mashup, or precision `mmx` generation, follow this checklist in order:
1. **Analyze source audio** with `scripts/analysis_orchestrator.py` when audio is available.
2. **Build M1 + M2 prompts** from the analysis: M1 is the primary style, M2 is a strong contrast style.
3. **Lint both prompts** with `scripts/lint_music_request.py` before generation. Stop on blockers.
4. **Generate with retry** via `scripts/generate_with_retry.py -- music generate ...` or `-- music cover ...`.
5. **Verify outputs**: duration, LUFS/peak, file size, audible completeness, lyrics alignment if applicable.
6. **Finalize delivery copy** with `scripts/finalize_track.sh input.mp3 output.mp3` when the user wants production-ready loudness.
7. **Deliver both versions** with a short analysis summary and any caveats.
Preserve the anti-sparse guard in prompts: fully arranged, instruments keep playing, no a cappella dropouts unless explicitly requested.
## Routing and Blocker Checks
Classify the request before analysis or generation:
- **Text-only style reference** means the user gave a song name, artist, era, or genre cue without source audio. Treat it as style inference, not cover analysis.
- **Reference audio or YouTube** means the user provided a file or playable source that should be analyzed.
- **Cover** preserves melody and usually needs a source file plus a target style decision.
- **Style transfer** uses a reference track or analyzed audio as style input, then changes the production direction.
- **Mashup** needs Song A and Song B, plus a decision about which one contributes content and which one contributes style.
- **Emotion prompt** means the user wants analysis turned into descriptive prompt language, not a full cover.
The [`scripts/lint_music_request.py`](scripts/lint_music_request.py) helper emits one of these routes:
| Route | When |
|---|---|
| `base_prompt` | Standard generation, no MiniMax-specific feature needed. |
| `minimax_cover` | Melody-preserving cover from audio or YouTube. |
| `minimax_mashup` | Two-song mashup (A + B, both identified). |
| `minimax_style_transfer` | Style transfer that does not preserve the source melody. |
| `minimax_emotion_prompt` | Emotion analysis, or precision `mmx` flag usage. |
| `needs_clarification` | At least one blocker is unresolved; ask the user first. |
Surface blockers before analysis:
- no source file or usable URL
- unclear which track is Song A versus Song B
- missing target style
- missing lyrics decision, such as original, translated, rewritten, or instrumental
- conflicting cover/style-transfer intent: the user asked for both "cover" (preserve melody) and "style transfer" (reproduce style) at once. These are mutually exclusive. Ask the user to pick one.
After you have prompt text and `mmx` flags, lint them together before generation:
- compare prompt BPM with `--bpm`
- compare prompt key with `--key`
- compare prompt structure line with `--structure`
- compare prompt duration with `--duration` (or implicit length expectation)
- compare prompt vocal mode with `--vocals`
- compare prompt language with `--language`
- compare prompt avoid language with `--avoid`
- stop when the prompt says one thing and the flags say another
- warn when prompt text exceeds 1800 UTF-8 bytes
- stop when prompt text exceeds 2000 UTF-8 bytes (observed API rejection at 2079 bytes)
If the user only has a text reference, route to the free-tool path in `references/free-tool-inputs.md` first. If the user has audio, analyze first and only then build the prompt. The linter returns a `retry_guidance` array with one hint per conflict so the operator can re-align prompt and flags on the next attempt.
## When To Use
Use this skill when the task involves:
- generating a cover of an existing song with a different style (chanson version of a rock track, reggaeton version of a pop hit, and so on)
- style transfer from a YouTube URL or audio file to a target genre
- two-song mashup where Song A's lyrics and emotional arc are kept, but Song B's style is applied
- emotion analysis on input audio to extract intensity curves, vocal speed, pitch bends, and emotion classifications
- generating lyrics in a specific language and theme via the MiniMax `lyrics_generation` API
- editing existing lyrics to match a target style or emotional arc (MiniMax `lyrics_generation` edit mode)
- using `mmx` CLI directly for fine control over `--avoid`, `--bpm`, `--key`, `--structure`, `--vocals`, `--instruments` as separate flags
- accessing MiniMax's `music-cover` or `music-cover-free` models for melody preservation
## Request Intake (adapted for MiniMax features)
After the Routing and Blocker Checks classify the request, run this 2-pass intake to extract the full set of fields the user cares about. Label each field's confidence: **clear** (user said it), **inferred** (sensible default), **missing** (need to ask), or **conflicting** (user said two incompatible things — pause to resolve).
### Fields checklist (MiniMax-specific)
| # | Field | What to look for | MiniMax-specific notes |
|---|---|---|---|
| 1 | Route | Cover / style transfer / mashup / standard / emotion prompt | From the Routing and Blocker Checks section. Determines which MiniMax features to use. |
| 2 | Source audio or URL | File path or playable YouTube URL | Required for cover, mashup, style transfer. For standard, optional (text-only style reference is also fine). |
| 3 | Song A identity | Name, artist, audio | For mashup: needed. For cover: this is the source. |
| 4 | Song B identity | Name, artist, audio | For mashup only. |
| 5 | Target style | Genre / mood / reference | The destination of the cover or style transfer. If user says "like Rosalía", that's clear. If user says "something good", that's missing. |
| 6 | Lyrics decision | Original / translated / new / instrumental | For cover, default to original (translated if user requests it). For standard, default to new (or user-provided). |
| 7 | Vocal mode | Solo / duet / choir / instrumental | Drives `--vocals` and `--language` flags. |
| 8 | Language | BCP-47 code (en, fr, es, etc.) | For lyrics language AND vocal language. |
| 9 | Duration | Approximate length (jingle ~30s, standard ~3min, epic ~6min) | `--length` is a hint in milliseconds, not a guarantee. Length is still driven mainly by lyrics + structure. |
| 10 | BPM, key, structure | Exact values if user wants `--bpm`/`--key`/`--structure` | Optional. If provided, the prompt AND flags must agree (lint them). |
| 11 | Emotion arc | For emotion-prompt workflows: which emotions to emphasize | Drives the analysis-to-prompt translation. |
| 12 | **Output location** | Where the audio and analysis files go | Same as the base skill — per-song subfolder in `~/Music mix/<project>/<song-slug>/`. |
Confidence map example: [`references/examples.md`](references/examples.md).
If any field is **missing** or **conflicting**, that's a question to ask. The `Ambiguity Questions` section below has specific patterns for each route. If everything is **clear** or **inferred**, the request is ready to translate.
## User Preference Flow (message patterns → action)
The skill does not start with a questionnaire. It starts by reading and inferring from the user's natural-language request.
| User says... | Skill does... |
|---|---|
| "Haz un cover de X en Y" | Route: `minimax_cover`. Ask: source audio file (or download from YouTube), target language for lyrics, vocal register. |
| "Make this song sound like Rosalía" | Route: `minimax_style_transfer`. Ask: source audio, which album/era of Rosalía. |
| "I have audio of A, mash with B, keep A's melody" | Route: `minimax_mashup`. Ask: A vs B confirmation, source audio for A, B can be name or audio. |
| "Analyze the emotion curve of this track" | Route: `minimax_emotion_prompt` (analysis-only). Run `analysis_orchestrator.py --audio` first, then read the JSON. |
| "I want the lyrics to be about X, in French, melancholic" | Route: `base_prompt` (standard). Use the lyrics API to generate, then pass to `mmx music generate --lyrics-file`. Ask: target BPM/key/structure or derive from analysis. |
| "Recreate the song but in 90 BPM D minor" | Route: `base_prompt` with `mmx` flags. Lint prompt vs flags before generation. Verify BPM/key consistency. |
| "I don't know, surprise me" | Pick a coherent default (e.g. upbeat indie pop, EN, ~3min, auto-lyrics, standard generation) and confirm with the user before generating. |
| "Same song again but as a reggaeton version" | Route: `minimax_cover` with the existing song as source. Use the same project/song subfolder, suffix the MP3 (`M1_original.mp3` + `M2_reggaeton.mp3`). |
This table is the **abstract** of `references/user-preference-flow.md` (which lives in the base skill). If you want a more detailed case, defer to the base skill's table and combine with this skill's route mapping.
## Output File Layout (Per-Song Subfolders)
**MiniMax-specific additions** (drop these into the per-song subfolder alongside the base items):
| File | Source | Notes |
|---|---|---|
| `<song-slug>_analysis.json` | `analysis_orchestrator.py --output` | MiniMax-specific analysis results (emotion, BPM, key, segments) |
| `<song-slug>_lyrics.txt` | `mmx music generate --lyrics-file` | Optional if user provided lyrics inline |
| `<song-slug>_<style>_prompt.txt` | The exact text passed to `--prompt` | For reproducibility |
The LLM should aim for the base skill's layout by default. The MiniMax-specific files are added on top when MiniMax features are used (cover workflow, mashup, analysis, etc.).
## Quick Start with the Orchestrator
One entry point for all input analysis:
```bash
python3 scripts/analysis_orchestrator.py --audio /tmp/song.wav
```
It routes audio files, two-song pairs, video, images, YouTube and JioSaavn
URLs, and combinations to the right extractors. Full per-input commands,
extraction guidance, and the per-song output layout:
[`references/orchestrator-quickstart.md`](references/orchestrator-quickstart.md).
## When NOT To Use
Do not use this skill when:
- the user only needs standard song generation without cover, mashup, or analysis — use `music-craft` instead (lighter, no MiniMax dependency)
- the runtime does not expose a `music_generate` tool and there is no `MINIMAX_API_KEY` configured — both skills need the runtime
- the user wants deterministic, single-shot generation with no iteration — overkill
- the user wants to mutate a specific existing audio file (pitch shift, time stretch, stem split) — that is post-production, not generation
- the user is not on a MiniMax Token Plan — the advanced features (cover, mmx per-flag control, lyrics API, emotion-driven prompts) require the plan
- the user needs a reliable full-length 3:00+ song with exact duration — prefer `music-craft` with ACE-Step instead; MiniMax is the right tool when speed, convenience, cover workflows, mashups, or mmx flag control matter more than exact output length
## Decision Tree
Use the base skill unless one of these MiniMax-specific needs is present:
- melody-preserving cover or style transfer from audio or YouTube
- two-song mashup
- lyrics API preview/edit flow
- emotion analysis that feeds the prompt
- exact `mmx` control for BPM, key, structure, or avoid lists
If the user wants a new song that only borrows a style, stay in `music-craft` unless they also need exact flag control or lyrics API iteration.
If the source is a YouTube or JioSaavn URL and download is blocked, ask for a local file before changing the workflow.
## Audio Source Fallback Order
When the user provides a URL as the source for cover, mashup, or style transfer, try audio sources in this order:
| Priority | Source | URL patterns | When to use |
|---|---|---|---|
| 1 | **YouTube** | `youtube.com`, `youtu.be` | Global music, well-known international tracks |
| 2 | **JioSaavn** | `jiosaavn.com`, `www.jio.com/jiosaavn` | Bollywood, Hindi, Tamil, Telugu, Malayalam, Bengali, and other Indian regional music not on YouTube |
| 3 | **mx3.ch** | `mx3.ch` | Niche or regional sources with direct audio links |
| 4 | **Local file / alternate URL** | User-provided file path or other URL | When all cloud sources fail |
**JioSaavn specifics:**
- **URL pattern:** `https://www.jiosaavn.com/song/<slug>/<id>` or `https://www.jio.com/jiosaavn`
- **How to fetch:** `yt-dlp` handles JioSaavn URLs directly — no special auth required
- **When to prefer it:** When the user names a Bollywood song, Indian film song, or regional Indian track that is not easily found on YouTube. JioSaavn has deep catalog coverage for Indian music.
- **Coverage notes:** Best for Bollywood/Hindi film music and major South Indian film music. Smaller regional labels may not be available. If JioSaavn returns no audio, fall back to mx3.ch or ask for a local file.
- **Third-party disclosure:** Inform the user when downloading from JioSaavn: "I'll download this from JioSaavn to analyze the audio." This is the same consent practice as for YouTube.
**Download failure fallback chain:**
1. Try the URL with `yt-dlp` directly
2. If blocked, try updating `yt-dlp` (`pip install -U yt-dlp`)
3. If still blocked, fall to the next source in the priority list
4. If all cloud sources fail, ask for a local audio file
## First Response Defaults
Use these defaults on the first pass:
- **Cover from audio or YouTube**: start with the one-step cover path. Switch to two-step only if the user wants translated lyrics, edited ASR lyrics, or custom lyrics.
- **Style transfer only**: do not use cover unless melody preservation matters. Use standard generation plus `mmx` flags if exact BPM/key/structure matter.
- **Two-song mashup**: anchor on Song A. If Song A has audio, default to the cover two-step workflow; if Song B is only named, ask for a short style description or fetch more context if free tools are available.
- **Lyrics API generation or edit**: use `write_full_song` for blank-page generation and `edit` for revisions.
- **Emotion-analysis-to-prompt**: run analysis first, then convert to a prompt; only ask whether the output should be cover, mashup, or standard generation, plus the target language if missing.
- **Exact BPM/key/structure control**: make `mmx` flags the source of truth and keep the prompt descriptive but non-conflicting.
## Ambiguity Questions
Ask at most 1-3 questions. Separate blockers from quality tweaks:
- Required blockers first: source file or URL, which song is A vs B, whether lyrics already exist, whether the output must preserve melody.
- Optional quality after blockers: target language, target style, BPM, key, structure, instruments, vocal color, avoid list.
Use these exact patterns when clarification is needed:
- **Cover**: "Which source should I use?" "Do you want the original lyrics, translated lyrics, or new lyrics?" "Any target style, or should I derive it from the source?"
- **Mashup**: "Which song is A and which is B?" "Do you have audio for Song B, or only the name?" "Should the lyrics stay the same or be rewritten?"
- **Lyrics API**: "Write from scratch or edit existing lyrics?" "What language should I target?" "Any hard structure requirements?"
- **Emotion prompt**: "Do you want cover, mashup, or standard generation?" "What language should the output use?" "Should I prioritize tenderness, energy, or structure?"
- **mmx precision**: "Which values are mandatory: BPM, key, structure, or avoid list?" "Any instruments or vocals that must stay in or stay out?"
## Relationship to `music-craft`
This skill **extends** the base skill, it does not replace it. The shared concepts are:
| Concept | Where it lives |
|---|---|
| Pre-Flight Check (platform detection) | This skill (extended required list) |
| Anti-sparse rules (canonical text) | Base skill, referenced from here |
| Prompt formula (production sheet) | Base skill, referenced from here |
| Structure tags (14 tags) | Base skill, referenced from here |
| User preference flow (auto-detect + ask) | Base skill, referenced from here |
| Output file layout (per-song subfolders, slug rules, version prefix) | Base skill, referenced from here; MiniMax adds analysis.json and lyrics.txt |
| Rate limits (generic) | Base skill |
| Quality verification checklist | Base skill, extended here for MiniMax |
| Operating rules (6-step loop) | Base skill, summarized here with MiniMax-specific extensions |
The MiniMax-specific additions are:
| MiniMax concept | Where it lives |
|---|---|
| `mmx` CLI quick reference | This skill |
| `mmx` full flag reference | This skill, [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md) |
| Cover workflow (one-step, two-step) | This skill, [`references/cover-workflow.md`](references/cover-workflow.md) |
| Lyrics generation API | This skill, [`references/lyrics-generation.md`](references/lyrics-generation.md) |
| Mashup workflow (A + B) | This skill, [`references/mashup-workflow.md`](references/mashup-workflow.md) |
| Emotion analysis (vocal speed, intensity, pitch) | This skill, [`references/emotion-analysis.md`](references/emotion-analysis.md) |
| MiniMax-specific error handling | This skill, [`references/error-handling.md`](references/error-handling.md) |
| Audio analysis scripts | This skill, [`scripts/`](scripts/) |
| Free tool inputs (web, image, memory) | Both skills — base layer in [`music-craft`](../music-craft/), MiniMax layer here in [`references/free-tool-inputs.md`](references/free-tool-inputs.md) |
## Pre-Flight Check
Run the extended pre-flight in
[`references/setup-and-preflight.md`](references/setup-and-preflight.md)
before the first generation or analysis. Never install anything without
explicit user consent; required: the `music_generate` tool, `MINIMAX_API_KEY`, `python3`, `mmx` —
if one is missing, ask, do not degrade silently.
## Free Tool Augmentation (Input Enrichment)
The OpenClaw runtime exposes several free tools (web_fetch, web_search, image analysis, memory, browser) that enrich the music generation workflow. The base layer is documented in [`music-craft` → Free Tool Augmentation](../music-craft/SKILL.md#free-tool-augmentation) and [`references/free-tool-inputs.md`](../music-craft/references/free-tool-inputs.md). This section shows how they compose with MiniMax-specific features.
Free-tool routing, blocker checks, prompt/flag lint, and MiniMax combos: [`references/free-tool-inputs.md`](references/free-tool-inputs.md).
## Operating Rules
Same 6-step loop as `music-craft`, with MiniMax-specific extensions:
1. **Read and auto-detect** — same
2. **Ask only the ambiguous parts** — same, plus ask if the user wants cover / mashup / standard
3. **Translate to a production-sheet prompt** — same, but consider whether to use `mmx` flags (see [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md)) instead of packing everything into the prompt
4. **Structure the lyrics** — same, plus consider lyrics API for generation or edit (see [`references/lyrics-generation.md`](references/lyrics-generation.md))
5. **Generate and verify** — same, plus the `music-cover` model for melody preservation
6. **Iterate** — same, plus emotion analysis to inform the next prompt adjustment
For the full 6-step detail, see `music-craft` → Operating Rules.
## Song length (`--length` is a hint, not a guarantee)
`mmx music generate --length` accepts milliseconds as a **duration hint**. It is useful, but it is not precise. **Don't expect mmx to hit 3:30 exactly.** Output length varies by ±20-30s depending on lyrics, structure, model, and generation randomness. If you need precise length, ACE-Step is the right tool (it has `audio_duration`). If you want MiniMax's vocal quality and the song length is flexible, mmx is fine.
Details: [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md).
## mmx CLI Quick Reference
The shape of a complete generation command:
```bash
mmx music generate \
--prompt "<production-sheet prompt>" \
--lyrics-file lyrics.txt \
--model music-2.6 \
--bpm 96 --key "D major" --structure "intro-verse-chorus-verse-chorus-bridge-chorus-outro" \
--vocals "<vocal description>" --genre "<genre>" --mood "<mood>" --instruments "<instruments>" \
--avoid "<what to avoid>" --out output.mp3
```
Full reference with all flags, examples, and model selection guidance: [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md).
## mmx Music Generation — verified patterns (June 2026)
Pattern A: full song with detailed prompt + 6 metadata flags (`--vocals`, `--genre`, `--mood`, `--instruments`, `--bpm`, `--key`) — production-grade output. Pattern B: crazy combo experiments (e.g. opera vocals over heavy metal), uses `scripts/generate_with_retry.py` wrapper.
Model selection guidance covers `music-2.6` vs `music-2.6-free` and when to use each; `--instrumental` and `--lyrics-optimizer` flags bypass the `--lyrics` requirement for BGM and auto-lyrics workflows.
Prompt length safety: prompts `>2000` UTF-8 bytes fail with `invalid params, prompt length not valid` — run `scripts/lint_music_request.py` before generation. URL expiration: when using `--output-format url`, the returned URL has a 24h time limit — always use `--out` or download promptly.
Details: [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md).
## Cover Workflow
Cover workflow preserves the original song's melody while applying a different style. Two paths exist: one-step (`mmx music cover` with prompt + audio, MiniMax extracts lyrics via ASR and applies the new style) and two-step (preprocess to get a `cover_feature_id`, edit ASR lyrics, then generate — better when lyrics need correction or the user wants different lyrics in the new style).
Full workflow with payloads, error handling, and use cases: [`references/cover-workflow.md`](references/cover-workflow.md).
## Lyrics Generation
MiniMax has a dedicated `lyrics_generation` endpoint that produces structured lyrics (with `[Verse]`, `[Chorus]`, etc. tags) from a theme prompt. Two modes:
- `write_full_song` — create new lyrics from a theme
- `edit` — modify existing lyrics (e.g., make the chorus stronger, shift to a hopeful ending)
The output is structured lyrics that can be passed directly to `music_generate` or `mmx music generate`.
Full detail with API examples, parameters, and use cases: [`references/lyrics-generation.md`](references/lyrics-generation.md).
## Web Lyrics Lookup (LRCLib)
As an optional complement to Whisper transcription, the orchestrator can look up song lyrics from **LRCLib** (open, no auth, JSON API at `https://lrclib.net/api`) when the song is a known mainstream track. This is a graceful fallback — Whisper is the primary source, LRCLib is a quality boost for the right song. When LRCLib is empty (the expected case for instrumentals), the script returns `no_web_lyrics` and the caller silently uses Whisper.
Full detail with CLI commands, `--lyrics-source` flag table, coverage notes, and scoring heuristic: [`references/lyrics-generation.md`](references/lyrics-generation.md).
## Mashup Workflow
The signature MiniMax-specific feature: combine Song A (content + emotion) with Song B (style).
Workflow:
1. Get Song A (audio file, YouTube URL, or song name)
2. Get Song B (audio file, YouTube URL, or song name)
3. Run emotion analysis on Song A (if audio available) to extract the emotional arc
4. Build a prompt that applies Song B's style to Song A's content and emotion
5. Generate using the cover workflow (preserves melody) or standard generation (creative reimagining)
This is the most powerful feature in this skill. The output preserves what makes Song A recognizable (lyrics, melody, emotion) while applying Song B's production style.
Full detail with the emotion-to-prompt conversion and the two-song analysis script: [`references/mashup-workflow.md`](references/mashup-workflow.md) and [`references/emotion-analysis.md`](references/emotion-analysis.md).
## Emotion Analysis
Emotion analysis extracts per-section features from input audio (intensity, pitch, vocal effort, breathiness, spectral centroid, emotion classification, repetitive intensification, emotional shifts, vocal speed, pitch bends). The analysis outputs JSON that the `emotion_to_prompt.py` script converts into a ready-to-use production-sheet prompt. Run analysis first when audio is available; use the local-only path (assemble prompt from JSON without the cloud helper) when MiniMax is blocked.
Full detail with detection cookbook, pipeline, scripts, local-only path, and the 25+ emotion set: [`references/emotion-analysis.md`](references/emotion-analysis.md). For emotion recipes in the OUTPUT, see [`references/emotion-delivery.md`](references/emotion-delivery.md).
## Analysis Quality (Summary Format, Confidence, Fallbacks)
Analysis scripts in `scripts/` produce different views (emotion, beats, melody, structure, instrumentation). The skill expects them to converge on a single compact summary so downstream code and humans can read the same shape regardless of which scripts ran.
### Compact Analysis Summary
Every analysis result should include a `summary` object with these keys:
| Key | Type | Meaning |
|---|---|---|
| `tempo` | string | BPM value with confidence, e.g. `120 BPM (confidence 0.92)` |
| `key` | string | Detected key, e.g. `E minor (confidence 0.71)` |
| `sections` | list | Section labels with timing, e.g. `[{"label": "verse", "start": 0.0, "end": 28.5}, ...]` |
| `instrumentation` | list | Detected instrument palette, e.g. `["electric guitar", "drums", "bass"]` |
| `vocal_traits` | dict | Breathiness, intensity, pitch range, e.g. `{"breathiness": "high", "intensity": "medium"}` |
| `energy_curve` | list | Per-section energy values, e.g. `[{"t": 0, "energy": 0.6}, ...]` |
| `hook_points` | list | Timestamps of detected hooks, e.g. `[12.4, 48.0]` |
| `mix_notes` | list | Short strings, e.g. `["vocal upfront", "wide stereo drums", "rolled-off highs"]` |
Scripts may add their own fields, but every script must return at least the keys above (use empty list / unknown string when a key has no data).
Confidence levels and fallback behavior for missing optional dependencies: [`references/advanced-audio-analysis.md`](references/advanced-audio-analysis.md).
## Rate Limits (MiniMax-specific)
Hard limits: 120 RPM, 20 concurrent connections, output URLs expire in 24 hours, cover feature IDs expire in 24 hours. Under Token Plan 3.0 (June 2026+), the actual ceiling is credit-based: **the documented 120 RPM is the API limit, but the Token Plan 3.0 quota is what determines your real ceiling.**
Full detail, Token Plan 3.0 credit-pool mechanics, 429 recovery steps, and the usage-check command: [`references/error-handling.md`](references/error-handling.md#rate-limits-minimax-specific).
## Anti-Sparse (MiniMax-Specific Deep Dive)
The base anti-sparse rules live in [`../music-craft/SKILL.md`](../music-craft/SKILL.md). MiniMax adds a more severe failure mode: **MiniMax interprets "sparse" or "minimal" as "remove all instruments"**, even more aggressively than other providers — never use those words in a prompt without pairing them with an explicit instrument list.
Full deep-dive with observed failure modes, mitigation steps, and the canonical phrase blocklist: [`references/error-handling.md`](references/error-handling.md#anti-sparse-minimax-specific-deep-dive).
## Quality Verification Checklist
Same 8-point checklist as the base skill, plus 4 MiniMax-specific items:
9. **Cover preserves melody recognisably.** If the user said "make it sound like Song X", the new version should be recognisable as Song X's melody with Song Y's style.
10. **Emotion curve matches Song A** (for mashups). The dynamic arc of the output should follow the original's intensity, not flatten to a single energy.
11. **`--avoid` flags are respected.** If the user said "no electronic sounds", the output should not have synths.
12. **Per-flag control worked** (BPM, key, structure). If the user asked for 80 BPM in E minor, the output should be in that range, not "close enough".
## Output Verification (Covers, Mashups, Style Transfer)
After generation, run a post-generation check that is specific to the route. Every cover, mashup, and style-transfer output is verified against its route checklist before delivery.
Route checklists (cover / mashup / style transfer / emotion prompt), failure-signature table, and revision prompt templates: [`references/error-handling.md`](references/error-handling.md#output-verification-covers-mashups-style-transfer).
## Lyrics Optimizer Behavior
Same as the base skill — when `music_generate` is called without explicit lyrics, MiniMax auto-generates. With this skill, you can also call the `lyrics_generation` API directly to preview the lyrics before generation, or to iterate via the `edit` mode.
If the user wants specific words, the `lyrics_generation` API's `edit` mode lets you modify auto-generated lyrics to match the user's intent without regenerating the whole song.
## Reference Map
- [`references/setup-and-preflight.md`](references/setup-and-preflight.md) — extended pre-flight: platform notes, required/optional dependencies, ask-the-user pattern, local analysis memory
- [`references/mmx-flags-reference.md`](references/mmx-flags-reference.md) — full `mmx` CLI flag reference with worked examples
- [`references/examples.md`](references/examples.md) — practical MiniMax examples with routing, first questions, workflow shapes, and prompt/flag lint catches
- [`references/cover-workflow.md`](references/cover-workflow.md) — one-step and two-step cover workflow with payloads, error handling, use cases
- [`references/lyrics-generation.md`](references/lyrics-generation.md) — the `lyrics_generation` API endpoint, both modes, examples
- [`references/mashup-workflow.md`](references/mashup-workflow.md) — two-song mashup workflow, emotion-to-prompt conversion, decision tree
- [`references/emotion-analysis.md`](references/emotion-analysis.md) — 25+ emotion classifications + per-emotion detection cookbook + emotion combinations + the analysis pipeline
- [`references/emotion-delivery.md`](references/emotion-delivery.md) — 21 emotion recipes for the OUTPUT + iteration loop + common mistakes
- [`references/orchestrator-quickstart.md`](references/orchestrator-quickstart.md) — per-input orchestrator commands (audio, two songs, video, image, YouTube, JioSaavn, combos), extraction guidance, per-song output layout
- [`references/minimax-generation-caveats.md`](references/minimax-generation-caveats.md) — sequential-run rules, output-file verification, duration-is-a-target caveats, and delivery copy templates
- [`references/advanced-audio-analysis.md`](references/advanced-audio-analysis.md) — advanced free tools (Essentia, Demucs, Basic Pitch, Music21, CREPE) for deeper analysis when basic librosa/parselmouth is not enough
- [`references/error-handling.md`](references/error-handling.md) — MiniMax-specific error table, recovery patterns, anti-sparse failure recovery
- [`references/free-tool-inputs.md`](references/free-tool-inputs.md) — MiniMax layer: free-tool routing, blocker checks, and prompt/flag conflict lint before analysis
- [`scripts/check_environment.py`](scripts/check_environment.py) — lightweight preflight diagnostic for Python, env vars, CLI tools, and optional packages
- [`scripts/lint_music_request.py`](scripts/lint_music_request.py) — standard-library helper for routing, blocker, missing-field, prompt, and `mmx` flag conflict checks
- [`scripts/smoke_test.py`](scripts/smoke_test.py) — standard-library smoke tests for pure helper behavior
- [`scripts/`](scripts/) — Python helpers for audio analysis (download, segment, analyze, convert emotion to prompt)
- [`music-craft`](../music-craft/) — base skill with shared concepts (Pre-Flight, anti-sparse, prompt formula, structure tags, Request Intake, User Preference Flow)
- [`music-craft` → references/free-tool-inputs.md](../music-craft/references/free-tool-inputs.md) — base layer for free tool inputs (web_fetch, web_search, image, memory)
- [`references/changelog.md`](references/changelog.md) — release history (v1.1.0, v1.0.0, v0.3.0); operating guidance lives in the topic references
don't have the plugin yet? install it then click "run inline in claude" again.