Write narration scripts for product tour videos. Produces dual-audience narration (technical + non-technical), timed to screen segments, with hooks and CTAs....
---
name: tour-narrative-scripting
version: 1.0.0
description: Write narration scripts for product tour videos. Produces dual-audience
narration (technical + non-technical), timed to screen segments, with hooks and
CTAs. Used by Sara for product tour content.
metadata:
openclaw:
emoji: 🎬
---
# Skill: Tour Narrative Scripting
**Owner:** Sara
**Version:** 1.0
**First used:** 2026-03-24 (Reddi Agent Protocol dual-audience tour)
---
## What This Skill Does
Produces dual-use copy from a single tour spec input:
- **Captions** — short labels for slideshow display (≤12 words each)
- **Narration** — 1–2 sentences per step for TTS voiceover
The common failure mode is writing captions optimised for the slide and then discovering the narration needs a full rewrite. This skill avoids that by treating both registers as a single writing unit from the start.
---
## Input
A tour spec (from Phase 1 / Archie) with:
- Step table (ID, title, URL path, audience tag, interaction)
- Narration arc (3–4 sentence story summary)
---
## The Dual-Register Pattern
```
Caption (slideshow): ≤12 words. Noun phrase or fragment OK. No audience assumptions.
Narration (TTS): 1–2 full sentences. Active voice. Natural spoken rhythm. Builds on caption.
```
Caption and narration serve the same step but speak to different contexts:
- The caption is read in silence on a screen. It must orient the viewer instantly.
- The narration is heard, not read. It must sound like a person talking, not a manual.
Write both at the same time for every step. Do not finish all captions then write all narration — the registers will drift.
---
## Output Format
```markdown
## Step 03 — Agent Marketplace [All]
**Caption:** Browse 11 registered specialists — filter by model, rate, reputation
**Narration:** The agent index lists every registered specialist with their model, per-call rate, and on-chain reputation score. No curation, no approval — registration is permissionless.
```
One block per step. Steps numbered with zero-padding (01, 02 …). Audience tag in brackets at the end of the heading.
---
## Examples
### Good
```
Caption: "Two paths. One protocol."
Narration: "Whether you're offering compute or hiring it, the protocol is the same.
On-chain escrow, blind reputation scoring, and pay-per-call — no monthly
subscriptions, no gatekeepers."
```
Why it works: Caption is a hook, not a description. Narration expands the idea rather than restating what's visible. Both work independently.
### Bad
```
Caption: "The landing page shows the product"
Narration: "This screenshot depicts the landing page of the Reddi Agent Protocol application."
```
Why it fails: Caption is a description of the screenshot rather than a statement about the product. Narration is stiff and reads as a label, not a voice. Neither would pass a spoken-aloud test.
---
## Audience Tagging Rules
| Tag | Who you're writing for | Pronoun use |
|---|---|---|
| `[All]` | Someone who doesn't know which path they're on yet | "you" is fine; avoid "as a specialist" or "as an orchestrator" |
| `[Specialist]` | Someone offering compute / running agents | "you" = specialist — "Your Ollama instance", "Your rate" |
| `[Orchestrator]` | Someone hiring compute / submitting jobs | "you" = orchestrator — "Your brief", "Your escrow" |
Never mix audience assumptions in a single caption. If a step is `[All]`, the caption must make sense to someone on either path.
---
## Timing Checks
Before finalising, read every step aloud:
| Element | Target duration | Action if over |
|---|---|---|
| Caption | ≤3 seconds | Cut words — fragment is fine |
| Narration | 8–15 seconds | Split into two steps rather than speeding TTS |
If narration regularly runs over 15 seconds, the step is trying to carry too much — break it up or move explanation to an earlier step.
---
## Common Mistakes
**Narration describes what's visible.**
Cut "Here we can see…" and "This shows…" entirely. The viewer can see the screenshot. Narration should describe what it *means*, not what it *is*.
**Over-explaining mechanics in every step.**
Establish the core protocol concept once (ideally in steps 1–3). After that, reference it briefly ("the same escrow mechanism") rather than re-explaining from scratch. Repetition reads as distrust of the audience.
**Losing the narrative thread.**
Each step should feel like the next sentence of the same story. Read the narration for all steps consecutively, ignoring the screenshots. Does it flow as a coherent 60–90 second monologue? If it reads as disconnected bullet points, rewrite for continuity.
**Pronoun drift in dual-audience tours.**
An `[All]` step that accidentally uses "your agent job" (orchestrator framing) alienates specialists. Check every pronoun in `[All]` steps against the audience neutrality rule.
**Saving narration for after captions are done.**
Captions written purely for the eye often use fragment logic ("Filter by model. Rate. Reputation.") that needs rewriting for the ear ("You can filter by model, rate, or reputation score."). Draft both together; adjust one register, then the other.
---
## Checklist Before Handoff to Kit (Phase 3)
- [ ] Every step has both a caption and a narration line
- [ ] All captions ≤12 words
- [ ] All narration passes the spoken-aloud timing test (8–15s)
- [ ] `[All]` steps contain no audience-specific pronouns
- [ ] Narration arc is traceable across all steps end-to-end
- [ ] Output formatted as step blocks (see Output Format above)
---
## Related
- Playbook: `playbooks/product-tour/PLAYBOOK.md`
- Upstream: Phase 1 spec from Archie
- Downstream: Phase 3 screenshot capture by Kit (captions used as filename hints); Phase 4 video narration by Finn (narration fed directly to TTS)
don't have the plugin yet? install it then click "run inline in claude" again.
added explicit inputs (tour spec structure, TTS rate context), procedural steps with input/output markers, decision points for over-length narration and pronoun drift, output contract with formatting template and full checklist, outcome signal tied to downstream use and narrative coherence.
source: clawhub
version: 1.0.0
first used: 2026-03-24 (Reddi Agent Protocol dual-audience tour)
write captions and narration for product tour videos as a single drafting unit, not sequentially. this skill outputs dual-register copy (slideshow captions under 12 words, TTS narration in 1-2 sentences) that maintains narrative coherence across steps while respecting audience-specific framing. use this when building tour specs that need to speak to multiple user personas without splitting the writing process into separate caption and voiceover passes (which causes register drift and rework).
tour spec document (from Phase 1, typically authored by Archie):
[All], [Specialist], [Orchestrator]), interaction descriptionexternal connections:
context needed:
read the full tour spec and narration arc end-to-end. absorb the story shape, user journey, and the conceptual hook. flag any steps where the audience tag shifts (e.g., [All] to [Specialist]). output: mental model of the entire tour's flow.
for each step, write the caption first. aim for 8-12 words. use noun phrases or fragments ("browse registered specialists , filter by model, rate, reputation"). do not describe what's visible on the screenshot; instead, label the idea or action the step introduces. the caption must orient a silent viewer in under 3 seconds. output: caption text.
immediately write the narration for the same step. use 1-2 full sentences. active voice. assume natural spoken rhythm (read it aloud). ground the narration in the caption's idea, but expand it with context, rationale, or consequence. never repeat what the caption says; build on it. output: narration text (target 8-15 seconds when read aloud at 130-150 wpm).
check audience tag alignment. if the step is [All], ensure no pronoun or framing assumes specialist or orchestrator knowledge. pronouns like "you" are fine; phrases like "your agent job" or "your model" are not. if [Specialist] or [Orchestrator], pronouns must clearly map to that role. output: revised caption/narration if pronouns drift.
read the full narration sequence aloud (all steps, no screenshots). does it sound like a coherent 60-90 second monologue, or disconnected bullet points? flag any step where the narration doesn't follow logically from the previous step. check for over-explanation: if you're re-explaining the core protocol in step 7 that was already covered in step 2, cut it and use a brief reference ("the same escrow mechanism") instead. output: list of steps needing rewrite for continuity.
rewrite flagged steps. prioritize narrative flow over individual step polish. a step's narration should feel like the next sentence of a single story, not a standalone explainer. output: revised narration for each flagged step.
format all steps as step blocks (see output contract below). number with zero-padding (01, 02, etc.). include audience tag in heading. output: formatted markdown document ready for handoff.
execute the checklist (see output contract). confirm every step has both caption and narration, all captions under 12 words, all narration under 15 seconds, no pronoun drift in [All] steps, and output formatted per spec. output: signed-off document or checklist with all items checked.
if narration runs over 15 seconds consistently across multiple steps: do not speed up TTS. instead, split the step into two steps, or move explanation to an earlier step and reference it briefly later. narration that feels rushed fails both comprehension and the spoken-aloud test.
if an [All] step accidentally uses audience-specific pronouns:
rewrite the caption and narration to use neutral language ("users can", "the protocol handles") or generic "you" that applies equally to specialists and orchestrators. test the step against both audience personas before finalizing.
if captions were drafted separately before narration: do not simply pair them as-is. captions written purely for the eye often use fragment logic that breaks when read aloud. re-draft both caption and narration together for that step, even if it means starting over.
if the narration describes what's visible on the screenshot: cut the description entirely. remove phrases like "here we can see", "this shows", "the screen displays". narration should explain what the screenshot means or what the user does next, not what the screenshot is. the viewer can see it.
if the narration arc (story summary) conflicts with a step's framing: escalate to Phase 1 (Archie) for spec clarification. do not invent new narrative logic to reconcile the conflict. output: escalation note with context.
format: markdown document with one step block per step. each block follows this template:
## Step [zero-padded ID] , [Step Title] [Audience Tag]
**Caption:** [8-12 words, noun phrase or fragment, oriented to the idea not the screenshot]
**Narration:** [1-2 full sentences, active voice, 8-15 seconds at 130-150 wpm, builds on caption without restating it]
example:
## Step 03 , Agent Marketplace [All]
**Caption:** Browse 11 registered specialists , filter by model, rate, reputation
**Narration:** The agent index lists every registered specialist with their model, per-call rate, and on-chain reputation score. No curation, no approval , registration is permissionless.
file location: save as [tour-name]-narration.md in the Phase 2 output folder (or as specified by downstream Phase 3/Kit).
checklist before handoff:
[All] steps contain zero audience-specific pronouns or framingyou know the skill worked when:
the narration reads naturally aloud. when you read the full script (all steps, no breaks) at a normal speaking pace, it sounds like one person telling a story, not a series of slide labels or manual entries.
captions orient viewers instantly. a non-expert can glance at each caption (3 seconds) and understand what action or idea the step introduces, even without the narration or screenshot.
downstream teams use the output as-is. Phase 3 (Kit, screenshot capture) uses captions as naming hints without rework. Phase 4 (Finn, video narration) feeds the narration directly to TTS with zero rewrites for timing or clarity.
both specialist and orchestrator watch the same [All] steps without alienation. a specialist reading "your agent job" in an [All] step is a failure signal; a clean [All] step works equally for both personas.
the tour avoids over-explanation. the core protocol is established once (usually steps 1-3); later steps reference it briefly ("the same escrow mechanism") without re-teaching. repetition is absent; trust in the audience is evident.
the script length matches the target video duration. total narration time, when summed and read aloud, lands within 10% of your target video length (e.g., 60-90 seconds for a short product tour).
playbooks/product-tour/PLAYBOOK.md