Advanced English polish skill for non-native writing. Nativization, style consistency, diff output. CLI scripts: detector.py + polish.py
---
name: "english-polish"
description: "Advanced English polish skill for non-native writing. Nativization, style consistency, diff output. CLI scripts: detector.py + polish.py"
version: 1.0.0
---
# english-polish — Professional English Polish & Nativization Skill
## Overview
Polish English text for native-level fluency while preserving the author's voice, argument structure, and analytical depth. Designed for technical/analytical non-fiction writing by non-native speakers.
**Core principle:** We are not translating Chinese to English. We are **nativizing** English that was written by a non-native speaker — making it flow naturally while preserving every fact, judgment, and nuance.
## When to Use
1. Non-native English manuscript chapters (like the 算力简史 project)
2. English-language analysis/reports written with Chinese syntax patterns
3. Any English output intended for native English-speaking readers
## Never Use For
- Creative writing (poetry, fiction, dialogue-heavy narrative)
- Academic papers requiring formal citation style preservation
- Translation from scratch (this is polish, not translate)
---
## CLI Tools
This skill ships with two Python scripts that automate Phase 0 detection and Phase 1-3 polishing:
### Installation
```bash
# Prerequisites
python3 -c "import re" # stdlib, no deps needed
```
### Detector: `detector.py`
```bash
# Check a file (default: plain text)
python3 scripts/detector.py check document.md
# HTML report (with score bar + table)
python3 scripts/detector.py check document.md --format html
# ANSI-colored terminal report
python3 scripts/detector.py check document.md --format ansi
# Batch scan a directory + JSON report
python3 scripts/detector.py batch /path/to/chapters/
# HTML batch (one HTML file per chapter)
python3 scripts/detector.py batch /path/to/chapters/ --format html --output /tmp/reports/
# Interactive mode (type or paste text)
python3 scripts/detector.py interactive
# List all 19 pattern categories
python3 scripts/detector.py list-patterns
```
**Output formats:**
- `text` (default): Plain text report (same format as above)
- `html`: Beautiful HTML with score bar, severity colors, pattern table
- `ansi`: Terminal-colored report for quick scanning
**Output sample:**
```
============================================================
PHASE 0 DETECTOR REPORT — chapter-09.md
============================================================
Word count: 4820
Total issues (wtd): 24.4
Density (/200 words): 1.0
Score: 4.9/5.0
Severity: Native-level
Detected patterns (5 categories):
[Noun Stacking (5+ consec noun/mod)] ×8 (weighted: 5.6)
→ "capital expenditure on cloud infrastructure"
→ "chip design competition is fierce"
['Not only...but also' overuse] ×2 (weighted: 1.2)
→ "Not only...but also"
['Should' false obligation] ×3 (weighted: 2.1)
→ "China should"
→ "Companies should"
...
```
### Polisher: `polish.py`
```bash
# Check + report (no file modification)
python3 scripts/polish.py check document.md
# Colorized check output
python3 scripts/polish.py check document.md --format ansi
# Polish file (creates output + diff report)
python3 scripts/polish.py polish document.md --output output.md
# Polish with HTML summary report
python3 scripts/polish.py polish document.md --output output.md --format html
# Style tweaks only (skip Phase 3 structural changes)
python3 scripts/polish.py polish document.md --style-only
# Batch polish a directory
python3 scripts/polish.py batch /path/to/chapters/
```
**Polish output includes:**
- Phase 1: Grammar & mechanics fixes (articles, prepositions, tenses)
- Phase 2: Fluency/Chinglish pattern replacement (18 categories)
- Phase 3: Style enhancement (sentence rhythm, word choice)
- Diff: side-by-side or inline before/after
- Report: score improvement, changes summary
### Makefile (CI Automation)
```bash
# Quick test suite only
make check
# Full test + book baseline + regression check
make test
# Run full-book baseline and compare with saved reference
make baseline
# Save current baseline as reference for future comparisons
make save-baseline
# Clean temp files
make clean
```
`make baseline` runs the detector on the full book and compares against the saved reference. If the average score changes by more than ±0.1, it flags the change for review.
### Custom Configuration (`config/`)
Extend the skill without modifying code:
```bash
# Copy and customize
cp config/default.json config/user.json
```
**user.json** supports adding custom detection patterns and polish replacement rules:
```json
{
"user_custom_patterns": [
{
"key": "Z_my_custom_pattern",
"label": "My Custom Pattern",
"weight": 1.0,
"patterns": ["\\bmy regex pattern here\\b"],
"examples": ["Example text"],
"notes": "What to look for"
}
],
"user_custom_replacements": [
{
"type": "filler",
"old": "my custom phrase",
"new": "better phrase",
"enabled": true
},
{
"type": "weak_verb",
"old": "perform an evaluation",
"new": "evaluate",
"enabled": true
}
]
}
```
---
## Architecture
### Three-Phase Polish Pipeline
```
Phase 0: Detection ────→ Score (0-5)
│
┌──────┴──────┐
│ │
score < 3 score ≥ 3
│ │
Phase 1: Phase 2:
Full Grammar Light Polish
+ Syntax (patterns
Restructure only)
│ │
Phase 2: Phase 3:
Chinglish Style-only
Replacement (word-level)
│ │
Phase 3: │
Style Pass (skip Phase 1)
│ │
Diff + Report
```
### 19 Pattern Categories (A-S)
| Code | Pattern | Weight | Example |
|:----|:--------|:------:|:--------|
| A | Subject Dangling | 1.0 | "For chip design, competition is fierce..." |
| B | Redundant "As For" / "Regarding" | 0.8 | "As for the energy cost, it is..." |
| C | Passive Chain Stacking | 1.0 | "It can be seen that..." |
| D | "Make + abstract noun" | 0.8 | "make a decision" → "decide" |
| E | "Not only...but also" overuse | 0.6 | default connector overuse |
| F | Noun Stacking (5+ consec noun/mod) | 0.7 | "Data center GPU market growth prediction models" |
| G | "Should" false obligation | 0.7 | "China should develop its own chip..." |
| H | "The" overuse for generic | 0.5 | "the computing power is..." |
| I | Chinese filler expressions | 0.7 | "to a certain extent", "in the process of" |
| J | "This is because" structure | 0.6 | Chinese-pattern explanatory |
| K | Redundant modifiers | 0.6 | "actively strengthen", "continuously improve" |
| L | Weak verb + nominalization | 0.7 | "conduct research" → "research" |
| M | Emphasis inversion (Chinese '是...的') | 0.7 | "It is X that..." → "X..." |
| N | List cadence enumeration | 0.5 | "First...Second...Finally" aggregation |
| O | Concessive cluster (虽然...但是) | 0.8 | "Although...but..." double conjunction |
| P | Topic-comment fronting | 0.7 | "As for X... / When it comes to..." |
| Q | False inclusive "We" | 0.5 | "We can see/observe/conclude..." |
| R | Logical sawtooth contrast | 0.5 | "On one hand...on the other hand..." |
| S | List summarizer | 0.5 | "All these factors contribute to..." |
### Scoring
- **0-5 scale**: normalized from weighted detection density
- **5.0**: Perfect (0 issues) or native-level
- **4.5+**: Native-level (minor style tweaks only)
- **4.0-4.5**: Near-native (phrase-level polish)
- **3.0-4.0**: Light polish needed (pattern-level)
- **2.0-3.0**: Full polish needed (sentence-level)
- **<2.0**: Heavy rewrite needed (structural issues)
**Scoring formula (per 200 words):**
- 0 weighted issues → score 5.0
- 1-3 medium → score 4.0-4.5
- 4-8 heavy → score 3.0-4.0
- >8 → score drops below 3.0
---
## Workflow
### Phase 0: Detection
Run automated detection before deciding to polish:
```bash
# Quick check
python3 scripts/detector.py check chapter.md
# Decision matrix:
# Score < 3: Full polish (Phase 1 → 2 → 3)
# Score 3-4: Light polish (Phase 2 → 3)
# Score 4+: Style-only (Phase 3)
```
### Phase 1: Full Chapter Polish (score < 3)
For each chapter:
1. **Read-through**: Understand argument structure, key claims, data points
2. **Sentence-by-sentence reform**: Fix grammar, syntax, flow WITHOUT changing:
- Any factual claim
- Any analytical judgment
- The chapter's core argument structure
- Technical terms (DO NOT rename "chiplet" → "modular chip" etc.)
3. **Style pass**: Check for
- Overuse of "However," at sentence start (replace with "But" or restructure)
- Overuse of "In the previous chapter..." transitions (merge into judgments)
- Chinese-pattern "Not only...but also..." (use sparingly)
- "This is because..." → restructure
- "Therefore, we can see that..." → delete, just state the conclusion
4. **Diff output**: Produce markdown diff showing before/after
### Phase 2: Light Polish (score 3-4)
Target specific patterns only:
- Article corrections (a/an/the)
- Preposition fixes
- Verb tense adjustments
- Comma splice repairs
- Chinglish pattern replacement (see 19-pattern library)
### Phase 3: Style-Only (score 4+)
- Rhythm adjustments (vary sentence length)
- Passive→active where natural
- Word choice: elevate from "correct but flat" to "natural and precise"
---
## Quality Gates
### Gate 1: Content preservation check
- Every data point in the original must still appear in the polished version
- Every core argument must be present
- Nothing added that changes meaning
### Gate 2: Nativization check
- Read aloud test: does it sound like a native speaker wrote it?
- If you stumble on reading, mark and fix
- No Chinese syntax traces: subject-drop, topic-comment structure, redundant "the" usage
### Gate 3: Style consistency check
- Tone matches the original author's voice (not overwritten with a new style)
- Technical terms preserved
- The polish feels invisible — reader shouldn't notice "this was translated" or "this was AI-polished"
---
## Output Format
For each polished chapter:
```markdown
# Chapter X: Title
## Polish Summary
- Original word count: X
- Changed words: X
- Sentences rewritten: X
- Nativization score: before X / after Y
## Changes
### Major Restructures (>50% rewritten)
1. [Original sentence]
→ [Polished sentence]
Reason: [syntax pattern detected + fix]
### Moderate Changes (sentence-level)
1. [Original fragment]
→ [Polished fragment]
### Word-Level Tweaks
1. [original word] → [new word]: reason
## Critical Notes
- Any factual concerns found during polish
- Any argument structure notes for author review
- Questions about intended meaning (for ambiguous phrases)
```
## Integration with Star1
When used for 算力简史 polishing:
1. Reference StarKB for technical terms that should NOT be changed
2. Preserve the Wang Jialin-style analytical voice (judgments before details, facts before emotions)
3. Do NOT:
- Add unnecessary hedging
- Soften strong judgments
- Remove skepticism/critical analysis
- Convert "this is bad" to "this may be considered challenging"
## Dependencies
- Python 3 (stdlib only — `re`, `sys`, `json`)
- Star1 or any advanced LLM (for sentence rewriting)
- `diff` command (for before/after comparison)
- No external API required
## Directory Structure
```
english-polish/
├── SKILL.md ← This file
├── Makefile ← CI targets (test, baseline, check, clean)
├── config/
│ ├── default.json ← Built-in configuration
│ ├── user.json ← User custom patterns & replacements (optional)
│ ├── baseline-reference.json ← Saved book baseline for regression checking
│ └── save-baseline.py ← Baseline compare/save script
├── scripts/
│ ├── detector.py ← Phase 0: Chinglish pattern detection (19 classes)
│ ├── detector.py.bak ← Backup of original 12-class version
│ └── polish.py ← Phase 1-3: full polish pipeline
├── tests/
│ ├── test_suite.py ← Automated test suite (29 test cases)
│ └── known_chinglish.md ← 19-pattern test corpus
└── references/
└── (optional: comparison reports, benchmark data)
```
don't have the plugin yet? install it then click "run inline in claude" again.