Periodically audit all workspace skills, learnings, memory, and configuration files to recommend refactoring, new skill ideas, and workflow improvements. Tri...
---
name: skill-auditor
description: Periodically audit all workspace skills, learnings, memory, and configuration files to recommend refactoring, new skill ideas, and workflow improvements. Triggered automatically via cron every 7 days, or manually with "audit skills", "skill review", "workspace health", or "improve workflow". Sends recommendations directly to Telegram without user prompting.
---
# Skill Auditor
Automated weekly workspace health check. Evaluates skills, learnings, memory, and config files. Delivers actionable recommendations to Telegram.
## Pipeline architecture
4-phase sequential pipeline with internal parallelism:
### Phase 1: Digest (`opencode-go/kimi-k2.5`)
Ingest all workspace files in one long-context call:
- `skills/*/SKILL.md` and associated scripts/tests
- `.learnings/LEARNINGS.md`, `ERRORS.md`, `FEATURE_REQUESTS.md`
- `SOUL.md`, `AGENTS.md`, `USER.md`, `TOOLS.md`, `MEMORY.md`, `HEARTBEAT.md`
- recent `memory/*.md` files (last 14 days)
Output: `audit-state.json` with per-file summaries, staleness scores, overlap detection, gap analysis.
Optimization: hash watched files against `state.json` from last run. Skip unchanged files to prevent token burn.
Also: `web_search` for best practices relevant to detected gaps.
### Phase 2: Evaluate (parallel)
**Phase 2A** (`opencode-go/glm-5`): Score each skill on effectiveness, token efficiency, coverage, staleness, overlap, alignment with USER.md goals. Propose new skill ideas.
**Phase 2B** (`openai-codex/gpt-5.3-codex`): Score independently. Generate concrete refactor proposals. Propose new skill ideas.
Both output structured evaluation JSON.
### Phase 3: Judge (`openai-codex/gpt-5.4`)
Receives: `audit-state.json` + both evaluation outputs.
- Cross-validate proposals, resolve conflicts
- Filter: only recommend changes with clear ROI
- Classify each recommendation:
- ๐ข **safe refactor** โ low-risk, can PR directly after approval
- ๐ก **needs review** โ structural change or new skill creation
- ๐ด **informational** โ trend or observation, no action yet
- Confidence threshold: โฅ0.7 to recommend, โฅ0.85 for safe-refactor classification
Output: `final-recommendations.json`
### Phase 4: Deliver (main session)
Format recommendations as Telegram message and send. Archive to `memory/audits/YYYY-MM-DD.json`.
## Recommendation format
Each recommendation:
```json
{
"id": "rec-001",
"type": "refactor | new-skill | config-update | deprecate | merge",
"severity": "green | yellow | red",
"target": "skills/context-optimizer/SKILL.md",
"title": "compress context-optimizer references section",
"rationale": "...",
"proposed_action": "...",
"confidence": 0.87,
"agreed_by": ["glm-5", "gpt-5.3-codex"]
}
```
## Telegram delivery format
```
๐ Weekly Skill Audit โ YYYY-MM-DD
๐ข Safe refactors (N):
1. [title] โ [one-line action]
๐ก Needs review (N):
2. [title]
๐ด Informational (N):
3. [title]
Reply with a number for details, or "approve 1,2" to greenlight.
```
If no strong recommendations: send "no action needed this week" one-liner.
If quality score is low across all recommendations: send nothing.
## Scheduling
**Primary:** OpenClaw cron, every 7 days (Sunday 10:00 AM ET):
```
openclaw cron add --schedule "0 10 * * 0" --model openai-codex/gpt-5.4 --label skill-auditor-weekly --prompt "Read skills/skill-auditor/SKILL.md and execute the full audit pipeline. Deliver results to Telegram."
```
**State tracking:** `memory/audits/last-run.json` records last execution timestamp. Heartbeat checks if last run was >10 days ago and alerts.
**Manual trigger:** User says "audit skills" or "review workflow".
## Evaluation criteria
Each file/skill scored on:
1. **Effectiveness** โ achieves stated purpose? (1-5)
2. **Token cost** โ bloated? shorter without losing value? (1-5)
3. **Coverage** โ workflow gaps not addressed by any skill? (binary + description)
4. **Freshness** โ last meaningful update vs relevance decay
5. **Overlap** โ duplicates content in another file/skill? (list pairs)
6. **Alignment** โ matches USER.md goals and SOUL.md persona? (1-5)
## Safety rules
- No automatic file edits. Recommendations are advisory until approved.
- Green recommendations produce diff previews; actual changes require explicit "approve" reply.
- Respect all workspace GitHub handling rules โ no repo-visible changes without Omar's approval.
## File structure
```
skills/skill-auditor/
โโโ SKILL.md
โโโ scripts/
โ โโโ build_audit_state.py
โ โโโ merge_evaluations.py
โ โโโ format_telegram.py
โโโ tests/
โโโ test_build_audit_state.py
โโโ test_merge_evaluations.py
โโโ test_format_telegram.py
```
Runtime artifacts (not tracked in repo):
```
memory/audits/
โโโ last-run.json
โโโ YYYY-MM-DD.json
โโโ state.json (file hashes for change detection)
```
## Validation checklist
1. All 3 helper scripts exist and pass unit tests.
2. Dry-run mode completes full pipeline without sending messages.
3. At least one real audit cycle delivers a well-formatted Telegram message.
4. Recommendations are advisory-only (no auto-edits without approval).
5. Unchanged files are skipped via hash comparison.
6. Confidence thresholds are enforced.
don't have the plugin yet? install it then click "run inline in claude" again.