PDF Sanitizer

Detect and redact sensitive information in PDFs — ID numbers, phone numbers, addresses, bank cards.

installs

stars

karma

SkillRank score ↗

6.9/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-07-04

pdf-sanitizer detects and redacts pii in pdf documents using regex and pattern matching, supporting multiple redaction modes (blackout, blur, placeholder) with user confirmation and structured reporting.

structure

8.0

trigger phrases

7.0

procedure

7.0

edge cases

4.0

documentation

7.0

strengths

SKILL.md

---
name: pdf-sanitizer
description: "Detect and redact sensitive information in PDFs — ID numbers, phone numbers, addresses, bank cards."
metadata:
  category: Document Processing
  priority: P0
  languages: zh-CN, en
---

# PDF Sanitizer

Detect and redact sensitive information in PDF documents while preserving original layout.

## Workflow

1. **Ingest PDF** — extract text layer and metadata via pdfplumber/PyMuPDF.
2. **Scan for PII** — run regex + AI pattern matching against Chinese and international PII:
   - Chinese ID number (18-digit)
   - Chinese phone numbers
   - Bank card numbers
   - Email addresses
   - Residential addresses (Chinese)
   - Person names (context-based)
3. **Highlight** — annotate every match with bounding boxes and category labels.
4. **Confirm** — present categories to user for selection. Default: all categories enabled.
5. **Redact** — apply chosen mode per category:
   - `blackout` — solid black rectangle over sensitive text
   - `blur` — pixel-level Gaussian blur on image-rendered area
   - `placeholder` — replace with `[REDACTED]` while keeping surrounding text
6. **Rebuild PDF** — flatten redactions into final output, preserving original fonts, images, and layout.
7. **Report** — output redacted PDF + JSON report listing each redaction:
   - original snippet (truncated), category, page number, bounding box, mode applied.

## Sample Prompt

```
pdf-sanitizer redact --input contract.pdf --categories id_card,phone,address --mode blackout
pdf-sanitizer redact --input 社保材料.pdf --output clean.pdf --categories all --mode placeholder
pdf-sanitizer scan --input report.pdf
pdf-sanitizer review --input contract.pdf --page 3-7
```

don't have the plugin yet? install it then click "run inline in claude" again.

PDF Sanitizer

SKILL.md

related skills