Transform, format, and process text with patterns for writing, data cleaning, localization, citations, and copywriting.
---
name: Text
description: Transform, format, and process text with patterns for writing, data cleaning, localization, citations, and copywriting.
---
## Quick Reference
| Task | Load |
|------|------|
| Creative writing (voice, dialogue, POV) | `writing.md` |
| Data processing (CSV, regex, encoding) | `data.md` |
| Academic/citations (APA, MLA, Chicago) | `academic.md` |
| Marketing copy (headlines, CTA, email) | `copy.md` |
| Translation/localization | `localization.md` |
---
## Universal Text Rules
### Encoding
- **Always verify encoding first:** `file -bi document.txt`
- **Normalize line endings:** `tr -d '\r'`
- **Remove BOM if present:** `sed -i '1s/^\xEF\xBB\xBF//'`
### Whitespace
- **Collapse multiple spaces:** `sed 's/[[:space:]]\+/ /g'`
- **Trim leading/trailing:** `sed 's/^[[:space:]]*//;s/[[:space:]]*$//'`
### Common Traps
- **Smart quotes** (`"` `"`) break parsers → normalize to `"`
- **Em/en dashes** (`–` `—`) break ASCII → normalize to `-`
- **Zero-width chars** invisible but break comparisons → strip them
- **String length ≠ byte length** in UTF-8 (`"café"` = 4 chars, 5 bytes)
---
## Format Detection
```bash
# Detect encoding
file -I document.txt
# Detect line endings
cat -A document.txt | head -1
# ^M at end = Windows (CRLF)
# No ^M = Unix (LF)
# Detect delimiter (CSV/TSV)
head -1 file | tr -cd ',;\t|' | wc -c
```
---
## Quick Transformations
| Task | Command |
|------|---------|
| Lowercase | `tr '[:upper:]' '[:lower:]'` |
| Remove punctuation | `tr -d '[:punct:]'` |
| Count words | `wc -w` |
| Count unique lines | `sort -u \| wc -l` |
| Find duplicates | `sort \| uniq -d` |
| Extract emails | `grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'` |
| Extract URLs | `grep -oE 'https?://[^[:space:]<>"{}|\\^`\[\]]+'` |
---
## Before Processing Checklist
- [ ] Encoding verified (UTF-8?)
- [ ] Line endings normalized
- [ ] Delimiter identified (for structured text)
- [ ] Target format/style defined
- [ ] Edge cases considered (empty, Unicode, special chars)
don't have the plugin yet? install it then click "run inline in claude" again.