Wjs Translating Subtitles

Use when the user has an SRT (or transcript text) in one language and wants it translated to another, with punctuation-bounded re-segmentation so cues end at...

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-31

wjs-translating-subtitles converts source-language SRT to target-language or bilingual SRT with mandatory punctuation-bounded re-segmentation. handles simplified chinese and english as first-class targets, with detailed translation principles and output validation rules.

structure

9.0

trigger phrases

9.0

procedure

8.0

edge cases

8.0

documentation

8.0

view original SKILL.md from clawhubclick to expand

---
name: wjs-translating-subtitles
description: Use when the user has an SRT (or transcript text) in one language and wants it translated to another, with punctuation-bounded re-segmentation so cues end at real sentence breaks. Simplified Chinese (zh-CN) and English (en) are first-class targets; other targets follow the same rules. Outputs a target-language SRT or bilingual SRT — no audio, no burn-in. Triggers — "翻译字幕", "翻成中文", "translate this SRT", "中英双语字幕", "把这个 SRT 翻译成 X", "bilingual subtitles".
---

# wjs-translating-subtitles

Source-language SRT in → target-language (or bilingual) SRT out. **This skill is text-only.** Burn-in lives in `/wjs-burning-subtitles`; voice dub in `/wjs-dubbing-video`.

## When to use

- User has an SRT in language A and wants it in language B.
- User pasted a transcript (with or without timestamps) and wants a translation that becomes an SRT.
- User has an SRT but cues end mid-sentence — this skill's re-segmentation step fixes that.

## When NOT to use

- No source-language SRT yet → run `/wjs-transcribing-audio` first.
- User wants burned-in subtitles → finish translation here, then `/wjs-burning-subtitles`.
- User wants voice dub → finish translation here, then `/wjs-dubbing-video`.

## Pick the target

Resolve target from the user's phrasing once, don't re-ask:

- "翻成中文 / 中文字幕 / 中文配音" → `zh-CN`.
- "translate to English / English subs / English dub" → `en`.
- "bilingual" / "双语" → produce both `.<source>.srt` and `.<target>.srt` (and optionally a combined `.<source>-<target>.srt`).
- Ambiguous → default to whichever the user has historically chosen in the project.

Simplified Chinese and English are fully validated. Other targets (Japanese, Korean, French, etc.) work via the same rules; the bottleneck is TTS-voice availability if dubbing follows — see `/wjs-dubbing-video` before promising.

## Shared translation principles

- Prioritize meaning over literal wording.
- Use concise subtitle-style language — viewers read at ~3 wps for Chinese, ~3–4 wps for English; lines that exceed that go off-screen before they can be read.
- Preserve the tone of the speaker. Casual source → casual target; formal source → formal target.
- Do not over-translate names, brands, cultural references, or technical terms.
- Keep numbers, dates, names, and places accurate.
- If a phrase has no exact equivalent, translate the meaning naturally. No literal/word-for-word constructions.
- Avoid stiff, machine-translated output.

## Translating into Simplified Chinese (zh-CN)

- Use natural spoken Mandarin for casual speech, formal Mandarin for formal speech.
- Use Simplified characters only (do NOT use Traditional Hanzi unless the user explicitly asks).
- Subtitle lines should be roughly **15 Chinese characters** or fewer per line, max 2 lines per cue (3 only when unavoidable for very long cues).
- Use Chinese punctuation: 「，」「。」「；」「：」「、」「——」. Never mix English commas/periods into Chinese subtitles.
- **Minimize filler demonstratives 「这」「那」「这个」「那个」「那份」「那种」「那里」「那样」.** Spanish-to-Chinese (and English-to-Chinese) MT routinely inserts these because the source has overt demonstratives that Chinese usually drops. Examples:
  - "这把我们带入二元世界的载体" → "把我们带入二元的载体"
  - "运用那份能量" → "运用这股能量" if needed, or just "运用能量"
  - "正是在这合一里" → "正是在合一中"
  - "像罪人那样翻滚" → "像罪人翻滚" / "像罪人般翻滚"
  - "那份精微的觉知" → "精微的觉知"
  Keep them only when they carry real meaning (deixis, contrast, or fixed phrase like spiritual "我就是那" / "tat tvam asi"). Default is to delete; add back only if the sentence becomes ambiguous.

Examples (Spanish → Chinese):

```text
Spanish: No pasa nada.            → Chinese: 没关系。
Spanish: Vamos a ver qué pasa.    → Chinese: 我们看看会发生什么。
Spanish: Me parece una locura.    → Chinese: 我觉得这太疯狂了。
Spanish: ¿Qué quieres decir?      → Chinese: 你是什么意思？
Spanish: La verdad es que no lo esperaba.
                                  → Chinese: 说实话，我没想到会这样。
```

## Translating into English (en)

- Use natural conversational English. Avoid translationese ("It is precisely through entering the body…" → "It's by entering the body…").
- Lines should be roughly **40–42 characters** or fewer (about 7–9 words), max 2 lines per cue. Hard cap 50 chars per line.
- Use ASCII punctuation: `,` `.` `;` `:` `—` (em-dash). Avoid Unicode curly quotes — keeps `.srt` portable.
- For contemplative/spiritual content, prefer plain words over Latinate jargon: "presence" over "manifestation," "wholeness" over "totality," "wake up" over "awaken to consciousness."

Examples (Spanish → English):

```text
Spanish: No pasa nada.            → English: It's nothing.
Spanish: Vamos a ver qué pasa.    → English: Let's see what happens.
Spanish: Me parece una locura.    → English: This feels crazy to me.
Spanish: ¿Qué quieres decir?      → English: What do you mean?
Spanish: La verdad es que no lo esperaba.
                                  → English: Honestly, I wasn't expecting this.
```

## Re-segment at punctuation boundaries (mandatory)

Whisper segments by silence/breath, not grammar. The result almost always has cues that **end mid-sentence** (e.g., "...es una forma de aterrizar," next cue starts "el espíritu en el cuerpo..."). Any TTS that processes one cue at a time will then insert an unnatural pause exactly where the original speaker did not. The fix is mandatory before dubbing — and improves on-screen reading too.

Punctuation set differs:

- Chinese cues must end at `，` `。` `；` `：` `——` or `、`.
- English cues must end at `,` `.` `;` `:` `—` (em-dash) or, in practice for subtitles, occasionally a single dash. Never end an English cue on a comma-less clause break, and never split inside a phrase like "kind of" or "in order to".

Rules:

- **Every cue must end at a real punctuation mark.** Never let a cue end on a noun, verb, conjunction, or article that flows into the next cue.
- It is fine (and often necessary) to **split** a single source cue into 2–4 shorter cues, with timestamps interpolated by character position within the original cue's duration.
- It is fine to **merge** the tail of one source cue with the head of the next when they form one clause — the merged cue inherits the start of the first and the end of the second.
- Target 3–8 seconds per cue. Cues shorter than ~1.5s feel choppy on screen; cues longer than ~10s usually contain a missed punctuation break.

A typical 2–3 minute talk yields roughly 25–40 punct-bounded cues from 12–18 raw source cues. Don't try to keep the original cue count.

When TTS dubbing follows: the punctuation-bounded structure means each TTS clip is a complete utterance with proper end-intonation, and concatenating clips sounds natural because every join is at a real pause point.

## SRT output rules

```text
1
00:00:01,200 --> 00:00:04,800
中文字幕内容

2
00:00:04,800 --> 00:00:08,500
中文字幕内容
```

- Number subtitles sequentially starting from `1`.
- Timestamp format: `HH:MM:SS,mmm`. Comma milliseconds, **never** period milliseconds.
- Do not overlap timestamps.
- Preserve the original timing unless adjustment is necessary.
- Each subtitle should usually be 1–2 lines.
- If one subtitle is too long, split it into shorter subtitles when timing allows.
- Do not add commentary inside the subtitle file.

## Bilingual output

When the user asks for bilingual: source on first line, target on second:

```text
1
00:00:01,200 --> 00:00:04,800
No pasa nada.
没关系。
```

Rules:

- Keep source first, target second.
- Preserve timing.
- Avoid adding extra explanations unless requested.
- Keep both lines short enough to read.

## Output formats

Depending on the user request, provide one or more:

1. Target-only `.srt`
2. Bilingual `.srt` (source line + target line)
3. Target transcript without timestamps
4. Side-by-side source/target table

Default output for "translate this SRT" with no other modifiers: **target-only `.srt`** + a short uncertainty note if needed.

## File naming

```text
input.srt                          # source (e.g., from /wjs-transcribing-audio)

translated outputs:
  input.zh-CN.srt                  # Simplified Chinese only
  input.en.srt                     # English only
  input.es-zh.srt                  # Spanish + Chinese bilingual
  input.es-en.srt                  # Spanish + English bilingual
  input.es-zh-en.srt               # three-language
```

BCP-47-style suffixes make the target language obvious at a glance and keep multiple target-language outputs side-by-side.

## Handling unclear audio markers

If the source SRT contains `[inaudible]` or `[unclear]`:

- Translate the surrounding context naturally.
- Keep the bracketed marker in the target SRT (don't invent content).
- If a `[unclear]` chunk makes a cue ungrammatical in the target language, leave it bracketed and add a note in the response (not in the SRT file).

## Quality gate before handoff

- Subtitle numbers are sequential
- Timestamps are valid (`HH:MM:SS,mmm`, no overlap)
- Milliseconds use commas
- Translation is natural; speaker tone preserved
- Line length within platform/cue caps
- Proper nouns accurate
- No cue ends mid-clause / mid-phrase
- No invented content

## Downstream

- **`/wjs-burning-subtitles`** — burn this SRT onto the video, or soft-mux as a togglable track.
- **`/wjs-dubbing-video`** — generate a TTS voice dub from this SRT, time-aligned to the original timing.
- **For bilingual playback**: most platforms can soft-mux multiple subtitle tracks, but if you need bilingual *visible at once*, burn the `*.source-target.srt` directly via `/wjs-burning-subtitles`.

## Common pitfalls

- **Letting the cue end mid-sentence after translation.** The source's silence-aligned cues are unsafe boundaries; re-segment at punctuation, always.
- **Filler demonstratives in Chinese output.** MT inserts 「这」/「那」 because the source had `eso/that`. Delete them aggressively.
- **Period milliseconds.** Whisper local writes `.mmm`; SRT spec is `,mmm`. Always normalize.
- **Translating proper nouns.** Brand names, place names, technical terms — leave as-is or use the conventional target-language version (e.g., "OpenAI" stays, "New York" → "纽约").
- **Over-shortening for cue caps.** If a line is genuinely longer than the cap, split into two cues with interpolated timestamps; don't drop meaning to fit the cap.
- **Forgetting to do re-segmentation when no dub is requested.** The punct-bounded SRT is also better for *reading* — line endings at natural pauses match how viewers scan. Re-segment even when burn-only.

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit inputs section with external connection clarity, expanded procedure into 7 numbered steps with clear input/output for each, formalized decision points for language ambiguity and downstream workflow, and added detailed outcome signals tied to subtitle editor and player validation.

wjs-translating-subtitles

Item: Wjs Translating Subtitles
Rating: 8.2
Author: Implexa

source-language SRT in, target-language (or bilingual) SRT out. text-only. burn-in lives in /wjs-burning-subtitles; voice dub in /wjs-dubbing-video.

intent

use this skill when a user has an SRT file or transcript (with or without timestamps) in one language and wants it translated to another. the skill translates the content, then re-segments cues at real punctuation boundaries so they don't end mid-clause. this fixes the common problem where auto-transcribed audio has cues that break at silence points, not grammar, making them unreadable and causing unnatural pauses if TTS dubbing follows. output is a clean, punctuation-bounded SRT in the target language, optionally bilingual. simplified Chinese (zh-CN) and English (en) are first-class targets; other languages follow the same rules but may have TTS voice limits downstream.

inputs

source SRT or transcript: file or pasted text, any language. if no timestamps, create them (assume even distribution across stated duration or ask user for duration).
source language: infer from content or ask user once if ambiguous.
target language(s): resolve from user phrasing (see decision points below). do not re-ask.
bilingual or target-only: infer from triggers like "bilingual" / "双语" vs. "translate to X". if ambiguous, default to target-only.
external connection: none required. all work is text processing, no APIs.
optional context: user's historical language choice in project (used to break ties).

procedure

parse and validate source
- input: raw SRT file or transcript text.
- if SRT: extract cue number, timecode range (HH:MM:SS,mmm format), and text for each cue.
- if plain transcript without timecodes: ask user for total duration or assume even distribution. interpolate start/end times across cues so each cue is roughly 3-8 seconds.
- output: structured list of (cue_id, start_time, end_time, source_text).
- error handling: if timestamps overlap or milliseconds use period instead of comma, normalize to comma format and flag for review.
detect source language and target language
- input: source_text sample, user phrasing.
- apply language detection heuristic (character distribution, common words, script type).
- resolve target from user triggers: "翻成中文" / "中文字幕" → zh-CN; "translate to English" / "English subs" → en; "bilingual" / "双语" → both; other codes (ja, ko, fr, etc.) map directly.
- if target is ambiguous, consult project history; if none exists, prompt user once.
- output: (source_lang, target_lang, bilingual_flag).
translate each cue (whole-cue translation, not segmented)
- input: each (cue_id, start_time, end_time, source_text) tuple.
- apply target-language rules (see below): tone preservation, subtitle-style conciseness, no translationese, proper noun handling.
- for zh-CN: enforce simplified characters, ~15 chars/line max, delete filler demonstratives (这, 那, 这个, etc.) unless they carry real deixis or contrast. use Chinese punctuation only.
- for en: aim for ~40-42 chars/line (7-9 words), no stiff jargon, plain words over Latinate forms. use ASCII punctuation only.
- for other targets: apply same rules as en or zh-CN as appropriate (e.g., ja subtitles follow zh-CN line-length caps; ko follows en rhythm).
- preserve [inaudible] / [unclear] markers; keep them bracketed, do not invent content.
- output: list of (cue_id, start_time, end_time, target_text).
re-segment at punctuation boundaries (mandatory)
- input: translated cues from step 3, target-language punctuation set.
- for each cue, check the end character:
  - zh-CN cues must end at ， 。 ； ： , , or 、.
  - en cues must end at , . ; : or , (em-dash). never end on comma-less clause break or inside phrases like "kind of".
- if a cue ends mid-clause or mid-sentence:
  - look ahead to the next punctuation mark in the same or next cue.
  - merge or split as needed: keep the content together until a real break, then interpolate new timestamps by character position.
- if splitting one cue into multiple: distribute duration proportionally by character count. if merging cues: inherit start time of first and end time of last.
- target 3-8 seconds per cue (avoid <1.5s choppiness, avoid >10s oversized cues).
- typical outcome: 2-3 minute talk yields 25-40 punct-bounded cues from 12-18 raw source cues.
- output: re-segmented list of (new_cue_id, new_start_time, new_end_time, target_text).
format output SRT
- input: re-segmented cues from step 4.
- for each cue, write:
```
<sequential_number>
<HH:MM:SS,mmm> --> <HH:MM:SS,mmm>
<text_line_1>
[<text_line_2>]
```
- enforce rules:
  - number sequentially from 1.
  - use comma milliseconds (,mmm), never period.
  - no overlapping timecodes.
  - max 1-2 lines per cue (3 only if unavoidable).
  - no blank lines between cues, one blank line between cues and next number.
- output: target-language .srt file, e.g., input.zh-CN.srt or input.en.srt.
if bilingual requested: create bilingual SRT
- input: original source SRT from step 1, target SRT from step 5. re-segment both to a common cue boundary (align at punctuation in both languages).
- for each cue, write:
```
<sequential_number>
<HH:MM:SS,mmm> --> <HH:MM:SS,mmm>
<source_line>
<target_line>
```
- if source and target have different natural break points, break at whichever comes first, then repeat the timecode for the overflow to the next cue.
- output: input.source-target.srt, e.g., input.es-zh.srt.
quality gate before handoff
- verify:
  - cue numbers are sequential, no gaps.
  - all timecodes valid (HH:MM:SS,mmm format, no overlap, end > start).
  - translation is natural and in-voice (not stiff, not over-literal).
  - speaker tone preserved (casual stays casual, formal stays formal).
  - line lengths within target language caps.
  - proper nouns and numbers are accurate and not invented.
  - no cue ends mid-clause, mid-phrase, or mid-word.
  - [inaudible] / [unclear] markers preserved, not invented around.
- if any gate fails, flag the specific cue(s) and fix before output.
- output: pass/fail signal.

decision points

target language ambiguous: user says "translate this" with no language named. check project history for most recent target choice. if none, ask user once ("Chinese or English?"). do not re-ask in same skill run.
bilingual or target-only: user says "bilingual" / "双语" or "side-by-side" or "source + target" → bilingual. user says "translate to X" or "only X subtitles" or "X version" → target-only. if unclear, default to target-only (simpler output).
source language unclear: auto-detect fails or user's input is mixed-language. ask user once ("is this Spanish or Portuguese?"). do not block on this; proceed with best-guess if user doesn't answer in ~30 seconds.
timestamps missing from source: user pasted plain transcript with no timecodes. ask total duration. if user provides duration, interpolate evenly across cues. if user says "don't know," assume typical talk pace (130-150 wpm for English, 180-220 cpm for Chinese) and estimate duration. flag in output that timings are estimated.
re-segmentation creates very short cues (<1.5s): acceptable if necessary to honor punctuation. note in output if multiple exist; don't force merger.
source cue already ends at punctuation, target doesn't after translation: e.g., source ends with period, but target translation adds a clause. re-segment target to match punctuation rule. if no punct boundary exists within the cue, split at a natural clause break and interpolate timing.
[inaudible] or [unclear] in source: keep bracketed marker in target. if it renders the target sentence ungrammatical (rare in translation), add a note in the response: "cue X: [unclear] segment breaks grammatical flow in target; kept as-is for source alignment."
downstream is TTS dubbing: this skill's output feeds /wjs-dubbing-video. the punctuation-bounded structure is mandatory; each cue boundary must be a real pause point so TTS clips concatenate naturally.
downstream is burn-in: this skill's output feeds /wjs-burning-subtitles. re-segmentation also improves on-screen readability; apply it regardless of whether dubbing follows.
rate limits or network timeout (if external MT API used): none in current design (text-only, local processing assumed). if future version integrates cloud MT, retry up to 2x with exponential backoff. timeout after 30s per cue batch.

output contract

success produces one or both of:

target-only SRT file (default)
- filename: <input_stem>.<target_lang>.srt (e.g., talk.zh-CN.srt, talk.en.srt).
- format: standard SRT (see procedure step 5).
- content: punctuation-bounded cues, natural translation, no invented content.
- encoding: UTF-8, LF line endings.
bilingual SRT file (if requested)
- filename: <input_stem>.<source_lang>-<target_lang>.srt (e.g., talk.es-zh.srt).
- format: standard SRT with source line 1, target line 2 per cue.
- content: aligned source and target, both punctuation-bounded where possible.
- encoding: UTF-8, LF line endings.
optional human-readable summary (response text, not in file)
- cue count before and after re-segmentation.
- any cues flagged for manual review (e.g., [unclear] segments, very long lines that were split, inferred timestamps).
- confidence note if language detection was auto or if source was plain transcript with interpolated timing.

all files are plain text, no binary, no burn-in, no audio.

outcome signal

user knows the skill worked when:

the output .srt file opens in any subtitle editor or media player and displays properly (correct timecodes, no garbled text, sequential numbering).
every subtitle cue ends at a real punctuation mark (period, comma, colon, etc. in the target language), never mid-word or mid-clause.
reading the subtitles on-screen matches the original speaker's pacing and tone (not stiff, not over-literal, not too fast or slow).
if bilingual, both source and target are readable on screen at the same time (neither line is crushed; typical max is 2 lines per cue, 1 source + 1 target).
if the output feeds into /wjs-burning-subtitles or /wjs-dubbing-video, those skills accept the SRT without re-segmentation or timing fixes.
proper nouns (names, brands, places) are spelled correctly and consistently across cues.
a spot-check of 5-10 random cues shows natural-sounding translation (not word-for-word, not machine-like).

Wjs Translating Subtitles

related skills

wjs-translating-subtitles

intent

inputs

procedure

decision points

output contract

outcome signal