Audit AI agent skills for security vulnerabilities. Use when scanning installed skills against the OWASP Agentic Skills Top 10, checking skills before runnin...
---
name: agentsec
description: >
Audit AI agent skills for security vulnerabilities. Use when scanning
installed skills against the OWASP Agentic Skills Top 10, checking skills
before running them, gating CI/CD on skill safety, or generating audit
reports (text, JSON, SARIF, HTML) for stakeholders.
version: 0.3.2
license: MIT
homepage: https://agentsec.sh
author: semiotic-ai
permissions:
- filesystem:read
metadata:
agentsec:
profile: meta
openclaw:
emoji: "🛡️"
homepage: https://agentsec.sh
requires:
anyBins:
- agentsec
- npx
- bunx
install:
- kind: node
package: agentsec
bins:
- agentsec
label: Install agentsec (npm)
---
# agentsec
`agentsec` is a security auditing CLI for AI agent skills. It scans every skill installed in a project against the OWASP Agentic Skills Top 10 and reports vulnerabilities, misconfigurations, and governance gaps.
## When to Use
Use `agentsec` when the user asks to:
- Audit, scan, or check agent skills for security issues
- Verify installed skills are safe before running them
- Check OWASP compliance of an agent setup
- Gate a CI/CD pipeline on skill security
- Generate a security report for stakeholders
## Quick Start
The fastest path to a result — no install, no flags:
```bash
npx agentsec
```
This scans every default skills directory on the machine — grouped by platform — plus any `./skills` folder in the current project (up to two levels deep), and audits each installed skill against the OWASP Agentic Skills Top 10. Always try this first.
### Auto-discovery locations
| Platform | Paths scanned |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| **Claude Code** | `~/.claude/skills`, `./.claude/skills`, `~/.claude/plugins/*/skills/*`, `~/.claude/commands`, `./.claude/commands` |
| **OpenClaw / ClawHub** | `~/.openclaw/workspace/skills`, `~/.openclaw/workspace-*/skills` (profiles via `OPENCLAW_PROFILE`), `~/.openclaw/skills` |
| **Codex / skills.sh** | `~/.agents/skills`, `./.agents/skills`, `../.agents/skills`, `/etc/codex/skills` |
| **Other** (generic) | Any `skills/` directory found within the current project, up to two levels deep |
## Core Commands
Every workflow starts from one of four commands. Run them with `npx agentsec` — no install needed.
```bash
# Full audit (scan + policy evaluation). Default command.
npx agentsec
# Scan only (no policy evaluation)
npx agentsec scan
# Generate a report from a previously saved audit JSON
npx agentsec report audit.json
# Manage and inspect policy presets
npx agentsec policy list
```
## Installation
`npx agentsec` needs no install. For repeated use, install globally:
```bash
# bun (recommended)
bun add -g agentsec
# npm
npm install -g agentsec
# pnpm
pnpm add -g agentsec
# yarn
yarn global add agentsec
```
Then drop the `npx` prefix:
```bash
agentsec
agentsec scan --path ./my-skills
```
## Flags
All flags work with any command.
| Flag | Short | Values | Default | Purpose |
| ------------ | ----- | ------------------------------- | ---------- | -------------------------------------------------------- |
| `--format` | `-f` | `text`, `json`, `sarif`, `html` | `text` | Output format |
| `--output` | `-o` | path | stdout | Write report to file |
| `--policy` | `-p` | preset name or path | `default` | Apply a policy preset |
| `--platform` | | `openclaw`, `claude`, `codex` | auto | Narrow to one agent platform |
| `--path` | | path | auto | Custom skill directory to scan |
| `--profile` | | `default`, `web3`, `strict` | `default` | Rule profile. `default` auto-detects Web3 skills; `web3` forces the annex on every skill |
| `--verbose` | `-v` | | off | Show detailed findings |
| `--no-color` | | | off | Disable colored output |
| `--help` | `-h` | | | Show help |
| `--version` | `-V` | | | Print version |
## Common Recipes
### Show detailed findings and remediation
```bash
npx agentsec --verbose
```
### Scan a specific directory
```bash
npx agentsec scan --path ./my-skills
```
### Target a specific agent platform
```bash
npx agentsec --platform claude
npx agentsec --platform codex
```
### Audit with a strict policy and save JSON
```bash
npx agentsec --policy strict --format json --output audit.json
```
### Generate an HTML report for stakeholders
```bash
npx agentsec --format html --output report.html
```
### Generate a SARIF report for IDE / code-scanning integration
```bash
npx agentsec --format sarif --output report.sarif
```
### List available policy presets
```bash
npx agentsec policy list
```
### Inspect the rules in a preset
```bash
npx agentsec policy show strict
```
### Validate a custom policy config file
```bash
npx agentsec policy validate ./my-policy.json
```
### Replay a previous audit as an HTML report
```bash
npx agentsec report audit.json --format html --output report.html
```
## Policy Presets
| Name | Use Case |
| -------------------- | -------------------------------------------------------------------- |
| `default` | Balanced policy. Blocks critical findings. |
| `strict` | Enterprise-grade. Blocks high and critical findings, enforces tests. |
| `permissive` | Lenient. Only blocks critical CVEs. Good for development. |
| `owasp-agent-top-10` | Built directly from the OWASP Agentic Skills Top 10. |
## Configuration File
`agentsec` auto-loads `.agentsecrc`, `.agentsecrc.json`, or `agentsec.config.json` from the current directory (or any parent):
```json
{
"format": "text",
"output": null,
"policy": "strict",
"verbose": false
}
```
CLI flags always override config file values. Omit `"platform"` and `"path"` to keep the default auto-discovery behavior — agentsec will scan every known platform's default locations.
## OWASP Agentic Skills Top 10
Every audit checks all ten risk categories:
| ID | Risk |
| ----- | ----------------------- |
| AST01 | Malicious Skills |
| AST02 | Supply Chain Compromise |
| AST03 | Over-Privileged Skills |
| AST04 | Insecure Metadata |
| AST05 | Unsafe Deserialization |
| AST06 | Weak Isolation |
| AST07 | Update Drift |
| AST08 | Poor Scanning |
| AST09 | No Governance |
| AST10 | Cross-Platform Reuse |
## AST-10 Web3 Annex (auto-detected)
Web3-touching skills are detected automatically and audited against twelve additional rules — no flag required. A skill is detected as Web3 when its manifest declares a `web3:` block, when its source imports a Web3 client library (`viem`, `ethers`, `web3`, `wagmi`, `@solana/web3.js`, `@coinbase/onchainkit`, `@privy-io`, `@biconomy`, `@zerodev`), when it references a Web3 RPC method (`eth_*`, `wallet_*`, `personal_sign`, `signTypedData`), or when it ships a `.sol` file. Detected skills are tagged `[Web3]` in the output:
```text
✔ scoped-trader v1.4.0 [Web3] C (62)
✔ helpful-summarizer v1.2.0 A (95)
```
`--profile web3` is still available — it forces the annex onto every skill regardless of detection (useful for cross-team CI consistency):
```bash
npx agentsec audit --profile web3 --path ./my-skills
```
| ID | Risk |
| ------- | ----------------------------------------------- |
| AST-W01 | Unbounded Signing Authority |
| AST-W02 | Implicit Permit / Permit2 Signature Capture |
| AST-W03 | Delegation Hijack via EIP-7702 |
| AST-W04 | Blind / Opaque Signing Surface |
| AST-W05 | RPC Endpoint Substitution & Mempool Leakage |
| AST-W06 | Unverified Contract Call Targets |
| AST-W07 | Cross-Chain / Bridge Action Replay |
| AST-W08 | MCP Chain-Tool Drift / Capability Smuggling |
| AST-W09 | Session-Key / Permission-Caveat Erosion |
| AST-W10 | Slippage / Oracle Manipulation by Agent Loop |
| AST-W11 | Key Material in Agent Memory / Logs |
| AST-W12 | No On-Chain Action Audit / Kill-Switch |
Skills can declare a `web3` block in their manifest (chains, signers, policy caps, session-key scopes, MCP server pinning, audit sink, kill-switch) so the annex can verify scoping without flagging well-bounded skills. See `docs/plans/ast10-web3-annex-rules.md` for full per-rule detection signals.
## Understanding the Output
Default output is compact: each skill shows its grade and score, followed by a one-line finding summary and a PASS/WARN/FAIL status.
```
✔ Found 6 skills
✔ fetch-data v1.0.0 D (42)
✔ deploy-helper v2.3.0 C (68)
✔ code-review v1.1.0 A (95)
6 skills scanned • avg score 78 • 4 certified
Findings: 2 critical, 1 high, 2 medium
⚠ WARN 3 high/critical finding(s) detected
```
Use `--verbose` for score breakdowns, rule IDs, file/line locations, and remediation for each finding.
## Exit Codes
- `0` — audit passed the active policy
- `1` — policy violation or fatal error
Use the exit code directly to gate CI pipelines — no special flag required:
```bash
npx agentsec --policy strict || exit 1
```
## Tips
- Start with `npx agentsec` — no install, no flags. Iterate from there.
- Add `--verbose` whenever you need to act on specific findings.
- Pipe `--format json` into `jq` or a custom script for programmatic handling.
- `strict` is the most common preset for production repositories.
- Browse the agent skills ecosystem at [skills.sh](https://skills.sh).
don't have the plugin yet? install it then click "run inline in claude" again.
clarified procedure steps with explicit input/output pairs, added decision points for auto-discovery vs custom paths, policy application, web3 detection, output formats, and error handling; documented all cli flags and policy presets as inputs; specified output contract for all four formats and file writing behavior; added outcome signal with success criteria and exit code semantics; preserved original author attribution and all original content.
agentsec audits installed ai agent skills against the owasp agentic skills top 10 security framework, checking for malicious code, supply chain compromise, over-privileged access, unsafe deserialization, weak isolation, outdated deps, poor governance, and cross-platform reuse risks. use it when you need to verify skill safety before running them, gate ci/cd pipelines on security compliance, generate audit reports for stakeholders (text, json, sarif, html formats), or validate web3-enabled skills against additional signing and transaction scoping rules.
required:
platform auto-discovery paths:
optional flags:
external connections: none required; no api keys, oauth, or credentials needed.
invoke audit command with npx agentsec (or agentsec if installed globally). no flags runs full audit with default policy against all auto-discovered skill paths. input: user invocation with optional flags. output: audit object queued for processing.
scan skills against owasp agentic skills top 10 rules (ast01-ast10). parse each skill's manifest (name, version, permissions, imports, declared web3 config), source code, and dependency tree. for each skill, emit findings keyed by owasp category (malicious skills, supply chain, over-privilege, insecure metadata, unsafe deserialization, weak isolation, update drift, poor scanning, no governance, cross-platform reuse). input: skill metadata, source, deps. output: raw findings list (one entry per vulnerability detected, including severity level: critical, high, medium, low).
detect web3 skills (auto, unless --profile default explicitly). a skill is tagged web3 if it declares web3: block in manifest, imports a web3 library (viem, ethers, web3, wagmi, @solana/web3.js, @coinbase/onchainkit, @privy-io, @biconomy, @zerodev), references web3 rpc methods (eth_*, wallet_*, personal_sign, signTypedData), or ships a .sol file. input: manifest, imports, rpc calls, file list. output: boolean web3 flag on skill record.
audit web3 skills against annex if detected (ast-w01 through ast-w12). check signing authority bounds, permit/permit2 capture, eip-7702 delegation, blind signing surfaces, rpc endpoint substitution, unverified contract calls, cross-chain replay, mcp drift, session-key erosion, slippage/oracle manipulation, key material in memory, on-chain audit trail. input: web3 config from manifest, source inspection. output: web3-specific findings.
compute grade and score per skill. aggregate findings by severity (critical counts as -30, high -20, medium -10, low -5). score ranges 0-100; grade is derived (a 90+, b 80-89, c 70-79, d 60-69, f <60). input: findings list. output: single score and letter grade per skill.
apply policy (step runs after all skills scored). load policy preset or custom config file. policy defines pass/fail rules (e.g., "fail if any critical or high finding", "fail if avg score <70", "fail if no tests declared"). evaluate each skill and the aggregate against policy rules. input: scores, findings, policy object. output: pass/warn/fail status for each skill and audit as a whole.
format output according to --format flag. text: human readable summary (skill list with grades, aggregate stats, exit status). json: structured findings object suitable for piping to jq or custom scripts. sarif: static analysis results interchange format for ide integration. html: styled html report for stakeholder review. input: audit state (skills, findings, scores, grades, policy result). output: formatted string or file written to --output path (or stdout if no --output).
exit with code 0 if policy passed, 1 if policy violated or fatal error. input: policy evaluation result. output: exit code to shell.
if user does not specify --path or --platform: scan all auto-discovery locations for all known agent platforms (claude, openclaw, codex, generic). this is the default and recommended path for most audits.
if user specifies --path: only scan that directory; skip auto-discovery entirely.
if user specifies --platform: only scan auto-discovery paths for that platform (claude, openclaw, or codex); ignore other platform paths.
if skill is detected as web3 (auto-detect or --profile web3): apply ast-w01 through ast-w12 annex rules in addition to core ast01-ast10. if skill has a web3: block in manifest declaring scopes (chains, signers, policy caps, session-key scopes), verify that findings align with declared bounds before flagging. unscoped web3 skills trigger failures under strict/owasp-agent-top-10 policies.
if user supplies --policy
if no policy supplied: default policy is applied (blocks critical findings, warns on high).
if --format json supplied: output raw audit object regardless of policy result. if --format sarif supplied:** map findings to sarif severities (critical/high to error, medium to warning, low to note). if --format html supplied:** generate styled report with grade badges, score breakdown, and remediation links.
if --output
if skill scan fails (parse error, permission denied, network timeout): emit fatal error for that skill, continue scanning others, report error count in summary, exit 1.
if no skills found: emit zero-skills-found message, exit 0 (this is not an error).
if policy evaluation results in warn status: print ⚠ warning to stderr, exit 0 (audit ran, but policy flag was soft). if fail status: print ✗ failure to stderr, exit 1.
if --verbose flag: expand output to include rule ids (ast01, ast-w05, etc.), file paths and line numbers where vulnerability detected, remediation guidance for each finding. omit verbose details if --verbose absent.
if --no-color flag: strip ansi escape codes from text output (no impact on json, sarif, html).
if custom .agentsecrc, .agentsecrc.json, or agentsec.config.json found in current or parent directory: load as base config. cli flags override config file values. omit "platform" and "path" keys in config to preserve auto-discovery behavior.
if network error (timeout, dns failure, unreachable registry) during dependency check: note finding as "dep check timeout" with severity medium or low, continue audit, exit 0 unless policy mandates failure on incomplete scans.
text format (default):
✔ Found N skills
✔ skill-name v1.0.0 [Web3] A (95)
⚠ skill-name v2.0.0 C (68)
✗ skill-name v1.5.0 F (35)
N skills scanned • avg score M • X certified
Findings: C critical, H high, D medium, L low
[PASS|WARN|FAIL] message
json format:
{
"audit_id": "uuid",
"timestamp": "iso8601",
"policy_used": "strict",
"platform_scanned": ["openclaw", "claude"],
"skills": [
{
"name": "skill-name",
"version": "1.0.0",
"web3": false,
"score": 95,
"grade": "A",
"findings": [
{
"id": "ast02",
"severity": "high",
"title": "outdated dependency",
"file": "package.json",
"line": 12,
"remediation": "update lodash to ^4.17.21"
}
]
}
],
"aggregate": {
"total_skills": 6,
"avg_score": 78,
"critical": 2,
"high": 1,
"medium": 2,
"low": 0
},
"policy_result": "WARN"
}
sarif format: standard sarif 2.1.0 output suitable for ingestion by github code scanning, gitlab sast, ide plugins. each finding mapped to owasp agentic skills top 10 rule id as ruleId, severity as level (error/warning/note), file/line as locations.
html format: single self-contained html file with:
file output: if --output
exit code: 0 if audit completed successfully and policy passed. 1 if policy failed, fatal error, or no skills found and strict policy applied.
user sees audit result immediately in terminal or in written report file:
text mode: colored summary printed to stdout showing skill grades, aggregate stats, and pass/fail banner. user can read results instantly and act on findings.
json mode: structured audit object printed to stdout (or written to file with --output). user can pipe to jq, import into a dashboard, or parse programmatically to gate ci/cd.
sarif mode: sarif json file written to --output path. user uploads to github/gitlab; vulnerabilities appear in code-scanning ui and pr comments.
html mode: html file written to --output path. user opens in browser; executive or team lead sees styled report with remediation guidance.
exit code: shell sees exit code 0 (pass) or 1 (fail). ci pipeline can branch on exit code: agentsec --policy strict || exit 1 stops the build if policy fails.
verbose output: user sees rule ids (ast05, ast-w03, etc.), exact file paths and line numbers, specific remediation steps. user knows which skill, which rule, where to fix.
success criteria: