check a SKILL.md for the 6 structural components tier-1 scoring looks at: intent, inputs, procedure with numbered steps, decision points, output contract, outcome signal. produces a 0-10 score plus a checklist of missing components. trigger when an author wants to know why their skill ranks below others or asks "is this skill good".
---
description: check a SKILL.md for the 6 structural components tier-1 scoring looks at. produces a 0-10 score plus a checklist of missing components. trigger when an author wants to know why their skill ranks below others or asks "is this skill good".
---
# skill quality audit
audit any SKILL.md against the same structural rubric implexa's tier-1 scoring uses. catches missing components, weak trigger language, and structural gaps before they hurt your skill's ranking. useful for authors iterating on their own skills or reviewing teammates' submissions.
## intent
before publishing a skill (or after seeing one score lower than expected), check it against the 6-component rubric. the audit is structural only - it does not evaluate whether the skill works at runtime (that is tier-2). but structural completeness is what tier-1 scoring measures, and tier-1 drives the leaderboard rank.
## inputs
- the SKILL.md body as text (paste it or pass the path)
- optional: the slug, if you want to compare against a previously-saved version
## procedure
### step 1, parse the frontmatter
every well-formed SKILL.md starts with yaml frontmatter:
```
---
description: <one-line summary, includes trigger phrases>
---
```
audit the description:
- is it present?
- does it include "trigger when" or trigger-phrase language?
- is it under 280 chars (the embedding-input sweet spot)?
points off if missing or vague.
### step 2, check for each of the 6 components
scan the body for these headers (or close equivalents):
1. **intent**: what the skill exists to do, in 1-3 sentences. not the procedure, the why.
2. **inputs**: what the skill needs to run. tools, data, user context, prereqs.
3. **procedure**: numbered or stepped sequence of actions. each step has a "what to render" or "what to capture".
4. **decision points**: branches. "if X then Y, else Z" patterns. what to do when things are ambiguous.
5. **output contract**: what the skill produces. format, length, where it goes.
6. **outcome signal**: how to know it worked. what would success look like 7 days later.
assign 0-2 points per component. 0 = missing, 1 = present but thin, 2 = substantive.
### step 3, score the trigger phrases
look for explicit trigger phrases in the description or in a dedicated section. count distinct phrases (or example user messages). more is not always better - aim for 3-7 high-signal phrases that map to how real users would ask.
### step 4, scan for anti-patterns
deduct points for:
- **vague procedure** ("do X carefully"): step description without a concrete tool call
- **missing error handling**: no decision points for the common failure modes
- **no measurable outcome**: outcome signal is "user feels good" rather than something observable
- **bloat**: skill body over 8k chars without justification (truncates during embedding)
### step 5, compute the score and render
sum the component points (max 12) plus the trigger-phrase score (max 3) minus anti-pattern deductions. normalize to 0-10. round to one decimal.
## decision points
- **the skill body has no headers at all**: it might be using a flat narrative style. parse for the content of each component instead of strict header matching. give partial credit.
- **multiple skills in one file**: split it into separate audits. one SKILL.md per skill is the right unit.
- **the description is missing entirely**: this is a hard fail (score capped at 4.0) because trigger matching breaks without it.
## output contract
a structured audit report with:
- overall score (0-10, one decimal)
- per-component checklist (✓ / ⚠ / ✗) with one-line notes
- top 3 concrete suggestions for improvement (ranked by impact on tier-1 score)
- the predicted tier-1 score if the suggestions are applied
## outcome signal
after the author applies the suggestions, the skill's actual tier-1 score (from list_skill_scores) moves up by at least the predicted delta. if it does not, the audit's heuristics need tuning.
## notes
- structural completeness is necessary but not sufficient. a perfectly-structured skill that does the wrong thing is still bad. tier-2 dry-run scoring catches functional quality, this audit only catches structural quality.
- the 6-component rubric is the implexa house style. anthropic, smithery, and other registries use looser structures - their high-scoring skills usually still hit most of these components even when not labeled.
- when in doubt, copy the structure of an existing high-scored implexa-curated skill (look at /scores filtered to source=implexa).
don't have the plugin yet? install it then click "run inline in claude" again.