Generic skill-quality auditor for any agent skill (Claude, OpenClaw, Cursor, etc.). Runs a 7-dimension static analysis (D1 process closure & idempotency, D2...
SKILL.md

---
name: skill-deep-audit
description: >
  Generic skill-quality auditor for any agent skill (Claude, OpenClaw,
  Cursor, etc.). Runs a 7-dimension static analysis (D1 process closure
  & idempotency, D2 tool/command conventions, D3 portability & defense,
  D4 skill usability, D5 security & op risk, D6 code & doc quality, D7
  dependency & footprint) with explicit ERR / WARN severity, 115-point
  scoring (pass line 90 + zero ERR), and an opt-in `--fix` workflow that
  always backs up first. Two depths: L1 static (~2 min) and L2 dryRun
  (~5 min, read-only hub + reachability checks). Strict red lines —
  read-only by default, never executes the audited skill's writes.
  Use when the user asks to "audit a skill", "check skill quality",
  "is this skill ready to ship", "lint my skill", or runs this tool by
  name. Triggers also: "审计这个 Skill"、"检查 Skill 质量"、"Skill 能上线吗"、
  "skill-deep-audit"、"审一下 xxx skill"。
---

# skill-deep-audit — Generic Skill Auditor

A read-only, multi-dimensional quality auditor for agent skills. Runs static
analysis + optional dryRun reachability checks and produces a scorecard.

- **Version**: 1.0.0
- **License**: MIT
- **Author**: Evan Song · [github.com/Songhonglei](https://github.com/Songhonglei)
- **Repository**: https://github.com/Songhonglei/build-better-skills
- **Part of**: `build-better-skills` suite (creation → audit → regression → sediment)

## Design principles

```
Can it run?            → D3 Portability + D4 Usability conventions
Does it run correctly? → D1 Process closure + D6 Code & doc quality
Is it safe to run?     → D5 Security & op risk
Is it well-conformed?  → D2 Tool & command conventions
Is the whole healthy?  → D7 Dependency & footprint
```

## NEVER DO

- Do **not** execute any write operation of the audited skill (read-only,
  reachability checks, and static analysis only).
- Do **not** modify the audited skill's files (read-only and report-only;
  `--fix` is the one exception and requires explicit user authorization).
- Do **not** fabricate check results — when undecidable, mark
  "cannot confirm, manual verification needed".

## MUST DO

- At the start of every audit, **ask the user to choose the check depth
  (L1 / L2 dryRun) and wait** for the choice.
- Any ERR → final result is FAIL, regardless of total score.
- After the audit, **write the scorecard MD file**.
- If an L2 dryRun check fails because an external dependency is
  unavailable, **downgrade to WARN** with the reason — do not abort.

---

## Dependencies of this skill (all soft — degrade gracefully if missing)

| Dependency | Purpose | Behavior if missing |
|-----------|---------|---------------------|
| A skill-hub query tool (e.g. `clawhub`) | Hub publish-status check (D7-W1) and dependency existence check (D7-W2 step 3) | Skip Hub checks, related items downgrade to WARN, do not abort the audit |

> The core of this skill is pure static / read-only analysis. **There are no
> hard external dependencies** — L1 static audit works even if no tooling is
> installed.

---

## Step 0: Ask for check depth

At the start of the audit, present the following options and **wait for the
user to choose explicitly**:

```
Please choose check depth:

L1 Static analysis (~2 min)
   File read, structural check, keyword scan, syntax check.
   Max 112 (skips items that need to touch external systems). Pass line ≥ 90.
   Good for: quick first-draft check.

L2 dryRun (~5 min, recommended) ⭐
   L1 + Hub existence check + dependency existence check + branch reachability
   simulation (file existence / env config / read-only verification of
   unhit branches).
   Max 115. Pass line ≥ 90.
   Good for: pre-release / pre-ship full acceptance.

Default recommendation: L2 dryRun. Reply 1 / L1 for static, 2 / L2 for dryRun
(empty Enter = L2).

⚠️ Note: L2 dryRun does ONLY read-only queries and reachability checks. It
   performs no writes / updates, and it does NOT actually run the audited
   skill's business workflow.
```

---

## Step 1: Locate the skill directory

The user may provide:
- A skill name (e.g. `my-skill`) → look for a same-named folder under
  `<skills-dir>/`. `<skills-dir>` is the agent's skills directory and may be
  e.g. `~/.claude/skills/`, `~/.openclaw/workspace/skills/`, or any path the
  user specifies. Some agents use a different layout — adjust to what's actually
  on disk.
- A relative or absolute path (e.g. `skills/my-skill/`) → use directly.

```bash
ls {skill-path}/
cat {skill-path}/SKILL.md | head -20
```

If the directory cannot be found → tell the user and stop. Do not guess.

---

## Step 2: Static analysis (runs at every depth)

Execute each check defined in
[references/check-rules.md](./references/check-rules.md) in order.

> ⚖️ **Determinism guarantee**: each rule's hit/miss decision uses the grep
> pattern, keyword list, and numeric thresholds defined for it in
> check-rules.md — **not** the agent's subjective judgement. Edge cases not
> explicitly covered by a rule are handled by the "False-Positive General
> Rules" section (marked "manual verification needed", not hard-judged). This
> guarantees stable, repeatable results across different agents / re-runs.

### 2.1 Collect file list

```bash
# Script extensions must cover mixed-language skills — .js/.cjs/.mjs/.ts cannot be missed
find {skill-path} -type f \( \
  -name "*.py" -o -name "*.sh" -o -name "*.md" -o -name "*.json" -o -name "*.yaml" \
  -o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) \
  | grep -v __pycache__ | grep -v node_modules | grep -v .git
```

> ⚠️ **Extension coverage blind-spot**: `find -name "*.js"` does **not** match
> `.cjs` / `.mjs` / `.ts`. Python scripts often `subprocess`-call a sibling
> `xxx.cjs` — if the file list misses `.cjs`, the auditor will wrongly report
> "called script does not exist" (false positive on D6-E4 / D6-E6). All later
> extension-scoped scans must include the full set.

### 2.2 Per-dimension static scan

Execute in order: **D1 → D2 → D3 → D4 → D5 → D6**.

> ℹ️ **D7 is not in this step**: D7 (dependencies & footprint) needs the code
> stats (see 2.4) plus Hub / existence checks, so it is consolidated into
> **Step 4**. Step 2 only scans D1–D6.

**Execution-level convention**:
- Rule title contains `L1` → runs at all depths.
- Rule title contains `L2 dryRun` → runs only at L2 dryRun; for L1, mark as
  `➖ skipped (L2 dryRun item)`.
- **D4-E5 scan must exclude** `{skill-path}/AUDIT-*.md` — audit reports are
  produced by this tool itself and are not part of the audited skill's package.

For each rule:
1. Check whether its execution level matches the current depth; if not,
   record ➖ skipped.
2. Run the corresponding scan command (grep / regex / file parse /
   agent-read judgement).
3. Record: pass ✅ / fail ❌ / skipped ➖.
4. Accumulate deductions.

### 2.3 D6-E1 script syntax check

```bash
# Python
for f in $(find {skill-path}/scripts -name "*.py" 2>/dev/null); do
  python3 -m py_compile "$f" 2>&1 && echo "OK: $f" || echo "SYNTAX ERR: $f"
done

# Shell
for f in $(find {skill-path}/scripts -name "*.sh" 2>/dev/null); do
  bash -n "$f" 2>&1 && echo "OK: $f" || echo "SYNTAX ERR: $f"
done
```

### 2.4 Code-size stats (prerequisite for D7)

```bash
# Number of script files (covers mixed skills: .js/.cjs/.mjs/.ts)
find {skill-path}/scripts -type f \( -name "*.py" -o -name "*.sh" -o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) 2>/dev/null | grep -v node_modules | wc -l

# Total line count (-r prevents hang on no-match)
find {skill-path} \( -name "*.py" -o -name "*.sh" -o -name "*.js" -o -name "*.cjs" -o -name "*.mjs" -o -name "*.ts" \) | grep -v node_modules | xargs -r wc -l 2>/dev/null | tail -1

# Skill-on-skill dependency: precise extraction (see D7-W2 "three-step join" algorithm)

# ① List all suspicious import candidates (just module names; ownership is resolved later)
grep -rnE "^\s*(from [a-zA-Z_][a-zA-Z0-9_]* import|import [a-zA-Z_][a-zA-Z0-9_]*)" {skill-path}/scripts/ 2>/dev/null
# ① supplementary: look for sys.path injection / skill_root concatenation
#    (this is the physical evidence of which skill an import belongs to)
grep -rnE "sys\.path\.insert.*skills/|_skill_root|skills/[a-z-]+/scripts" {skill-path}/scripts/ 2>/dev/null

# ② subprocess calls into other skills' scripts (by path)
grep -rnE "skills/[a-z-]+/scripts|_skill_root.*scripts" {skill-path} 2>/dev/null | grep -v __pycache__
# ③ Explicit declaration in SKILL.md
grep -nE "metadata.*requires|depends on .* skill|requires the .* skill|use .* skill" {skill-path}/SKILL.md 2>/dev/null
# → Agent then deduplicates, applies the three-step join to fix ownership, annotates purpose,
#   runs the existence check (D7-W2), and writes the result into report section
#   "VI. Skill Dependencies".
# → Stdlib and well-known PyPI packages (os/sys/json/re/requests/openpyxl …) are excluded
#   from ownership judgement.
```

---

## Step 3: Hub existence check (runs at L2 dryRun)

> **Pre-check**: this step requires a skill-hub query tool (e.g. `clawhub`).
> If unavailable → skip Hub checks; mark D7-W1 as "cannot verify (no hub
> tooling)", downgrade to WARN, do not abort.

1. Extract the `name` field from frontmatter.
2. Use the available skill-hub query tool to check whether the skill is
   already published.
3. Record the result against D7-W1 (not published → WARN, not ERR).

---

## Step 4: Dependency & footprint analysis (D7)

- **D7-W1** Hub publish status (consolidated from Step 3).
- **D7-W2** Precise dependency-skill list + purpose annotation + existence
  check (local ✅ / hub-has-not-installed ⚠️ / not-found ❌). Output a
  full list in report section "VI. Skill Dependencies" regardless of count;
  ≥ 5 deps → WARN; depending on a "not found ❌" skill → ERR; depending on a
  "hub-has-not-installed ⚠️" skill → WARN.
- **D7-W3** Code ≥ 5000 lines or scripts ≥ 10 → identify high-cohesion
  modules and suggest a split direction.

---

## Step 5: Aggregate scoring

**Total 115 points**

| Dimension | Max |
|-----------|-----|
| D1 Process closure & idempotency | 13 |
| D2 Tool & command conventions | 10 |
| D3 Portability & defense | 15 |
| D4 Skill usability conventions | 21 |
| D5 Security & op risk | 21 |
| D6 Code & doc quality | 31 |
| D7 Dependency & footprint health | 4 |
| **Total** | **115** |

> 📊 **Scoring convention**: ERR is uniformly 3 points (a hit means FAIL; the
> point value carries no real meaning). WARN uses three priority tiers
> (high 3 / mid 2 / low 1) — the difference is meant to guide fix order.

**Dual-judgement (both conditions must hold for PASS)**:

Pass line is uniformly 90 at both depths (skipped items don't count toward
the actual max but don't change the pass line):

| Depth | Actual max | Pass line |
|-------|-----------|-----------|
| L1 static | 112 | **≥ 90** |
| L2 dryRun | 115 | **≥ 90** |

| Condition | Result |
|-----------|--------|
| Total ≥ pass line **AND** zero ERR | ✅ **PASS** |
| Any ERR, **OR** total < pass line | ❌ **FAIL** |

---

## Step 6: Produce the scorecard MD file

Generate the full report using
[references/output-template.md](./references/output-template.md).

Write path: `{skill-path}/AUDIT-{YYYY-MM-DD}.md`

> `AUDIT-*.md` should not be packaged with the skill (D4-E5 will detect this).

---

## Step 7: Output summary

```
📋 Audit complete: {skill-name}
─────────────────────────────────────
Total score: {score}/{max}   {PASS ✅ / FAIL ❌}  (L1 max 112 / L2 dryRun max 115)
Pass line:   ≥ 90 (uniform across L1 / L2 dryRun)  AND  zero ERR (dual-judgement)
Depth:       {L1 static / L2 dryRun}

🔴 ERR: {n}   |   🟡 WARN: {n}
Highest-priority fix: {ID and name of the highest-deduction ERR}

Estimated score after fixing all ERR: {estimated}/{max}

📁 Scorecard: {skill-path}/AUDIT-{date}.md

🔧 Fix: {N} items auto-fixable / {M} items need human confirmation
   Reply "fix" to start auto-fix (the skill folder is backed up first).
```

---

## Step 8: Auto-fix (`--fix`) behavior spec

> ⚠️ **This is the only step in skill-deep-audit that is allowed to modify
> the audited skill's files**, and only after **explicit user
> authorization**. The day-to-day audit (Step 0–7) strictly observes the
> "audit-only, never fix" red line.

### Trigger conditions

- The user explicitly replies "fix", "apply fix", "`--fix`", "fix 5.1", etc.
  after the report is delivered.
- **Without explicit user authorization, never auto-fix.** The report only
  *recommends*; it does not *execute*.

### Fix scope tiers (corresponds to report section "V. Fix Recommendations")

| Sub-section | Type | Auto-fix? |
|-------------|------|-----------|
| **5.1 Auto-fixable** | Pure text / config / docs (add `version`, add prerequisites, edit wording, add dependency declaration, normalize reference prefixes — no business logic) | ✅ User says "fix" → batch apply |
| **5.2 Needs human confirmation** | Business logic / script code (change control flow, change field matching, change HTTP call, change column mapping, remove over-privileged steps) | ⚠️ Must confirm each item with the user; user approves one → fix one |

### Execution flow (strict order)

1. **Mandatory pre-fix backup**:
   - Copy the entire audited skill directory to a backup path:
     `{skill-path}.bak-{YYYYMMDD-HHMMSS}`
   - **Immediately tell the user the full backup path.**
   - If backup fails → abort the fix, don't touch anything.
   ```bash
   BACKUP="{skill-path}.bak-$(date +%Y%m%d-%H%M%S)"
   cp -r "{skill-path}" "$BACKUP" && echo "✅ Backed up to $BACKUP"
   ```

2. **Apply fixes item by item**:
   - 5.1 items: edit directly per the report's "③ Fix" section. After each
     change, briefly report `✅ Fixed [ID]`.
   - 5.2 items: only change after the user explicitly confirms that item;
     items the user hasn't approved are not touched.

3. **Do not auto-re-audit**:
   - After fixes, **prompt the user**:
     `🔧 Fixed {n} items. Re-run the audit now to verify? (reply "re-audit" to start)`
   - **Wait for the user to confirm "re-audit"** before re-running Steps 0–7.

4. **Fix record**: in the report or reply, list "which files / which items
   were changed + backup path" so the user can roll back.

### Red lines (also apply during fix)

- Do not execute any write operation.
- Do not delete any file (even if it looks redundant); if deletion is needed
  ask the user separately.
- 5.2 business-logic items are **never** changed unilaterally, even if they
  "look safe".
- If the user wants to roll back after fix: instruct the user to restore by
  copying the backup directory over.

---

## Part of build-better-skills

This skill is part of the
[build-better-skills](https://github.com/Songhonglei/build-better-skills)
suite — open-source skills that help you build better skills, end-to-end:

| Stage | Skill | Status | What it does |
|-------|-------|--------|--------------|
| Creation | `skill-creator` | 🚧 Not yet released | Scaffold a new skill from intent |
| **Audit** | **`glic-check`** | ✅ **v1.0.x** | **Fast, qualitative multi-dimension review (G/L/I/C + U) — run right after any edit** |
| **Audit** | **`skill-deep-audit`** | ✅ **v1.0.0** | **Comprehensive dryRun-level exam — 7 dimensions, 115-pt score, `--fix`** |
| Testing | `skill-regression` | 🚧 Not yet released | End-to-end regression testing |
| Sediment | `skill-sediment` | 🚧 Not yet released | Promote successful workflows into new skills |

Two complementary tools share the **Audit** stage:

- **`glic-check`** — lightweight, qualitative. Run it right after a change for a
  quick multi-dimension sanity review (no score). Best for tight edit loops.
- **`skill-deep-audit`** (this skill) — heavyweight, quantitative. A full
  dryRun-level evaluation that grades the skill on a 115-point scale with
  ERR/WARN findings and a scorecard. Best as a pre-ship "final exam".

Only `glic-check` and `skill-deep-audit` ship today. The other entries are
roadmap placeholders — they will appear in the suite repo as they are
open-sourced.

## Rule references

- Full rule decision logic → [references/check-rules.md](./references/check-rules.md)
- False-positive / boundary general rules → [references/check-rules.md "False-Positive General Rules" section](./references/check-rules.md)
- Controlled-domain config (D2-E1, default empty) → [references/controlled-domains.md](./references/controlled-domains.md)
- Report MD template → [references/output-template.md](./references/output-template.md)
skill-deep-audit

SKILL.md

related skills