Seo Audit

Audit and optimize any website's SEO + GEO (AI/LLM visibility) + Core Web Vitals, then fix what's broken. Runs a portable, zero-dependency hard-gate audit ov...

installs

stars

karma

SkillRank score ↗

8.6/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-17

seo-audit runs 12 hard-gate checks (h1, viewport, landmarks, title, canonical, og, images, csp, page weight, json-ld, url hygiene) over built static html via two zero-dependency node scripts, then reports ranked findings with exact fix references and an lcp playbook for core web vitals optimization.

structure

9.0

trigger phrases

9.0

procedure

9.0

edge cases

8.0

documentation

9.0

view original SKILL.md from clawhubclick to expand

---
name: seo-audit
description: "Audit and optimize any website's SEO + GEO (AI/LLM visibility) + Core Web Vitals, then fix what's broken. Runs a portable, zero-dependency hard-gate audit over a build output directory (Astro/Next/Hugo/Jekyll/plain HTML) and a live-URL crawl/GEO audit (robots.txt AI-crawler policy, sitemap, llms.txt, on-page JSON-LD, canonical, security headers). Includes a battle-tested LCP playbook (7.5s → 1.5s mobile on the reference site): render-blocking CSS, critical-CSS split, font preload discipline, third-party JS deferral. Use when asked to: check/improve a site's SEO, raise a Lighthouse/PageSpeed score, fix slow LCP / Core Web Vitals, make a site discoverable by AI agents (ChatGPT/Claude/Perplexity/Gemini), add structured data, set up robots/sitemap/llms.txt, or review a site before launch. Distilled from the reference site's production build-time SEO gates."
compatibility: Claude Code, Claude Desktop, Cursor
keywords:
  - seo-audit
  - seo
  - geo
  - ai-visibility
  - llms.txt
  - structured-data
  - json-ld
  - robots.txt
  - sitemap
  - canonical
  - lighthouse
  - pagespeed
  - core-web-vitals
  - lcp
  - render-blocking
  - critical-css
  - font-loading
  - LCP优化
  - 性能优化
  - open-graph
  - schema.org
  - technical-seo
  - generative-engine-optimization
  - ai-crawler
  - indexnow
  - hard-gates
  - SEO体检
  - SEO优化
  - 网站SEO
  - AI可见性
  - 结构化数据
  - 站点审计
  - 上线前检查
  - 搜索引擎优化
  - GEO优化
  - 爬虫策略
metadata:
  author: zeze
  source: a production site's build-time audit gates + worker SEO layer
  openclaw:
    homepage: "https://github.com/Cosmofang/seo-audit"
    author: "zeze"
    runtime:
      node: ">=18"
    permissions:
      - "Reads files under the build-output directory you point it at (audit-seo.mjs)"
      - "Makes outbound HTTPS requests to the live URL you provide (audit-live.mjs)"
      - "Writes nothing — reports are printed to stdout / saved only where you redirect them"
---

You audit a website's **technical SEO** and **GEO (Generative Engine Optimization — being found, cited, and recommended by AI assistants)**, report concrete problems ranked by impact, and apply fixes. This is an *actionable harness*, not just advice: two zero-dependency Node scripts do the measuring, the reference files tell you exactly what to fix and why.

The rules here are distilled from a production site whose build **fails** if any gate is violated — that discipline is why it scores high. Treat the gates as hard constraints, not suggestions.

## When to use
- "Check/improve my site's SEO", "raise my PageSpeed/Lighthouse SEO score", "review before launch"
- "Make my site visible to AI / ChatGPT / Claude / Perplexity", "add llms.txt", "fix robots for AI crawlers"
- "Add structured data / JSON-LD / schema", "set up sitemap / canonical / Open Graph"
- "My LCP is slow / fix Core Web Vitals / PageSpeed says my site takes 7 s to load"

## What you have
- `scripts/audit-seo.mjs` — **on-disk auditor.** Runs the 12 hard-gate checks over a directory of built HTML + its local CSS/JS/images. Framework-agnostic. Node ≥18, no install.
- `scripts/audit-live.mjs` — **live-URL auditor.** Checks the things only visible on a deployed origin: robots.txt policy (incl. per-AI-bot allow/deny), sitemap.xml, llms.txt, homepage JSON-LD (@graph-aware), canonical, HSTS / Vary / Cache-Control.
- `references/hard-gates.md` — the 12 gates: exact thresholds, the general rule, and the Astro+Cloudflare reference implementation.
- `references/structured-data.md` — copy-paste JSON-LD recipes (Organization, WebSite, Breadcrumb, Article, Product, FAQ) using the nested `@graph` pattern.
- `references/geo-ai-visibility.md` — the GEO layer: robots AI-crawler allowlist (exact user-agents), `llms.txt` format, AI-oriented schema, IndexNow.
- `references/lcp-playbook.md` — **Core Web Vitals deep-dive**: the measured levers that took the reference site from 7.5 s → ~1.5 s mobile LCP (render-blocking CSS, critical-CSS split, font discipline, deferring non-LCP DOM, third-party JS, CLS guardrails, CI lock-in). Use when the problem is *speed*, not markup.

## Workflow

### 1. Locate the build output (don't audit source — audit the shipped HTML)
SEO lives in the *rendered* HTML. Find the build dir: Astro `dist/`, Next `out/` (or `.next` after `next export`), Hugo `public/`, Jekyll `_site/`, Vite `dist/`, or a plain folder. If it doesn't exist yet, run the project's build first. Confirm with the user if ambiguous.

### 2. Run the on-disk audit
```bash
node scripts/audit-seo.mjs --dir <build-dir>
# options: --strict (warns→errors, CI mode) · --json · --max-page-kb 500 · --max-img-kb 500
```
Read the output: `✗` = ERROR (genuinely hurts ranking / breaks crawlers / Core Web Vitals — fix these first), `⚠` = WARN (best-practice miss). The heuristic score is a rough dial, not a Lighthouse number.

### 3. Run the live audit (if deployed)
```bash
node scripts/audit-live.mjs https://www.example.com
```
This is where GEO shows up: which AI crawlers are allowed/blocked, whether `llms.txt` exists, what JSON-LD `@type`s the homepage actually ships.

### 4. Report and fix
- Group findings by severity; fix ERRORs first, then high-value WARNs.
- For each fix, open `references/` for the exact target value and the reference implementation, then edit the source (templates/layout/config) — **not** the built HTML (it's regenerated).
- Re-run the audit to confirm green. For CI, wire `audit-seo.mjs --strict` into the build so regressions fail the pipeline.

## The 12 hard gates (cheat-sheet — full detail in references/hard-gates.md)
1. **Exactly one `<h1>`** per page.
2. **Viewport meta** `width=device-width, initial-scale=1`.
3. **Semantic landmarks** — `<main>` + `<nav>` + `<footer>` present.
4. **`<title>`** present (≈10–60 chars) + **meta description** present (≈50–160 chars). *Length is a soft warn — longer is a valid deliberate GEO choice.*
5. **Canonical** — absolute-URL `<link rel="canonical">`, host matches the deploy origin (build-time, never runtime).
6. **Open Graph** — `og:title` + `og:image`.
7. **Images** — every `<img>` has `width`+`height`+`alt`; non-hero `loading="lazy"`; hero `fetchpriority="high"`; each file ≤500 KB (WebP/AVIF).
8. **No inline executable `<script>`** (allow only `application/ld+json`/`json`/`importmap`) and **no `on*=` handlers** → strict CSP `script-src 'self'`.
9. **No external resource refs** (fonts/img/css/js) — self-host for CSP + speed.
10. **Page weight** — HTML + same-page CSS + JS ≤500 KB (images budgeted separately).
11. **Structured data** — JSON-LD present; site-wide Organization + WebSite.
12. **URL hygiene** — lowercase, trailing slash, ≤3 path depth, one route source of truth.

GEO layer (references/geo-ai-visibility.md): robots.txt explicitly **allows** the major AI crawlers, ship **`llms.txt`**, enrich Organization schema with `knowsAbout`, ping **IndexNow** on deploy.

**If the complaint is LCP / PageSpeed performance** (gates pass but the site is slow): open `references/lcp-playbook.md`. Diagnose first — find the *actual* LCP element (often text, not an image) and measure with DevTools applied throttling, **not** Lantern/simulated. Then work the levers in impact order: render-blocking CSS → critical-CSS split → font preload discipline → defer non-LCP viewport DOM → eager hero image → third-party JS on idle/interaction → IntersectionObserver-deferred init. Keep CLS at 0 by reserving space for everything you defer, and lock wins in with a Lighthouse CI gate + per-page JS byte budget.

## Notes
- 404/50x pages are exempt from canonical/description/OG/JSON-LD (they're noindex by design) — the auditor already skips them.
- The auditor uses conservative regex extraction, not a full DOM — it's for audit *signals*. A clean run is strong evidence, not a formal guarantee.
- Don't relax a threshold to make the audit pass. Fix the page.

---

## Purpose & Capability

seo-audit is an **actionable SEO + GEO auditing harness**. Two zero-dependency Node scripts do the measuring; three reference files tell you exactly what to fix and why. It turns the build-time SEO discipline of a production site into a portable, framework-agnostic gate you can run on any site.

| Capability | Description |
|------------|-------------|
| On-disk hard-gate audit | `audit-seo.mjs` runs 12 hard gates (h1, viewport, landmarks, title/desc, canonical, OG, images, CSP-safe scripts, no external refs, page weight, JSON-LD, URL hygiene) over any built static dir |
| Live-URL GEO audit | `audit-live.mjs` checks robots.txt AI-crawler policy, sitemap.xml, llms.txt, homepage JSON-LD (`@graph`-aware), canonical, HSTS/Vary/Cache-Control |
| Fix references | `references/` gives exact thresholds, copy-paste JSON-LD recipes, and the AI-crawler allowlist + llms.txt format |
| LCP playbook | `references/lcp-playbook.md` — measured Core Web Vitals levers (7.5 s → 1.5 s mobile LCP on the reference site) with diagnosis method, impact ranking, CLS guardrails, and CI lock-in |
| CI integration | `audit-seo.mjs --strict` turns warnings into errors so regressions fail the build |

**Does NOT:**
- Audit source files — it audits *rendered/built* HTML (run your build first)
- Modify your site — it reports; you (or the agent) apply fixes to source templates
- Replace Lighthouse — its score is a heuristic dial for audit signals, not an official number
- Send your URL or content anywhere except the live origin you explicitly pass to `audit-live.mjs`

## Instruction Scope

**In scope (will handle):**
- "Check / improve my site's SEO", "raise my Lighthouse/PageSpeed SEO score", "review before launch"
- "Make my site visible to AI / ChatGPT / Claude / Perplexity", "add llms.txt", "fix robots for AI crawlers"
- "Add structured data / JSON-LD / schema", "set up sitemap / canonical / Open Graph"
- Running either auditor and reporting findings ranked by severity, then applying fixes via `references/`

**Out of scope (won't handle):**
- Off-page SEO, backlinks, keyword-ranking tracking, or paid-search work
- Content writing / copywriting beyond meta title & description guidance
- Auditing a directory that hasn't been built yet (build first, then point the auditor at the output)

**Behavior on missing input:**
- `audit-seo.mjs` with no `--dir` defaults to `dist`; if the directory has no HTML it reports zero pages (no crash)
- `audit-live.mjs` with no URL prints usage and exits with code 2

## Credentials

**No credentials required.** This skill uses no API keys, tokens, or accounts.

| Action | Credential | Network |
|--------|-----------|---------|
| `audit-seo.mjs --dir <dir>` | None | None — local filesystem read only |
| `audit-live.mjs <url>` | None | Outbound HTTPS to the URL you pass (and its robots.txt/sitemap/llms.txt) |

No hardcoded secrets exist anywhere in the scripts.

## Persistence & Privilege

**Writes:** nothing by default. Both scripts print reports to stdout; `--json` still prints to stdout. Output is persisted only where you redirect it (e.g. `> report.json`).

**Does NOT write:**
- No files inside your project, the skill directory, or your home directory
- No shell-config or credential files
- No cron jobs or background processes

**Privilege:** runs as the current user, no sudo or elevated permission. Requires only Node ≥18 (global `fetch`).

**Uninstall:** delete the skill directory — there is no other state to clean up.

## Install Mechanism

### Standard install (clawHub)

```bash
clawhub install seo-audit
```

### Manual install

```bash
cp -r /path/to/seo-audit ~/.claude/skills/seo-audit/
```

### Verify install

```bash
node scripts/audit-seo.mjs --help 2>/dev/null || node scripts/audit-seo.mjs --dir . --json | head
node scripts/audit-live.mjs            # should print usage and exit 2
```

Both scripts are zero-dependency (Node ≥18). No `npm install` step is needed.

---

*Version: 1.1.0 · Created: 2026-06-09 · Updated: 2026-06-10 · Changes: see [CHANGELOG.md](CHANGELOG.md)*

don't have the plugin yet? install it then click "run inline in claude" again.

restructured original into all 6 required implexa components, made implicit decision logic explicit (ai crawler rules, slow-site diagnosis, 404 exemption, threshold-relaxation prohibition), added edge cases (build not run, empty dirs, lighthouse disagreement, regex limitations), documented runtime/privilege/persistence, clarified what happens on missing input, and kept original author attribution and reference implementations intact.

intent

audit a website's technical seo, geo (generative engine optimization , being found and cited by ai assistants), and core web vitals. report concrete problems ranked by impact, then apply fixes using battle-tested reference implementations. this is actionable: two zero-dependency node scripts do the measuring, three reference files tell you exactly what to fix and why. the gates are distilled from a production site whose build fails if any gate is violated , treat them as hard constraints, not suggestions. use this when asked to check/improve seo, raise lighthouse/pagespeed scores, fix slow lcp or core web vitals, make a site discoverable by ai agents (chatgpt/claude/perplexity/gemini), add structured data, set up robots/sitemap/llms.txt, or review a site before launch.

inputs

required

build output directory: path to rendered html + css/js/images (e.g. dist/, out/, public/, _site/, or a plain folder). must exist and contain at least one .html file. framework-agnostic (astro/next/hugo/jekyll/vite/plain html all work).
live origin url (if running geo/live audit): deployed url, e.g. https://www.example.com. used only for outbound https requests to robots.txt, sitemap.xml, llms.txt, and homepage html , no credentials needed.

external connections

none. no api keys, oauth, or third-party accounts required.
outbound https only: if you run audit-live.mjs, it fetches robots.txt, sitemap.xml, llms.txt, and homepage html from your origin. no data is sent to any third party.

runtime

node.js ≥18 (global fetch required). zero dependencies , npm install is not needed.
read access to the build directory (on-disk audit).
outbound https (live audit).

reference files (bundled in the skill)

references/hard-gates.md , the 12 gates: exact thresholds, general rule, astro+cloudflare reference impl.
references/structured-data.md , copy-paste json-ld recipes (organization, website, breadcrumb, article, product, faq) using nested @graph pattern.
references/geo-ai-visibility.md , geo layer: robots ai-crawler allowlist (exact user-agents), llms.txt format, ai-oriented schema, indexnow.
references/lcp-playbook.md , core web vitals deep-dive: measured levers (7.5s to 1.5s mobile lcp on reference site), diagnosis method, impact ranking, cls guardrails, ci lock-in.

procedure

step 1: confirm the build directory exists and is built

input: the user's project directory. action: ask the user where their build output lives (astro → dist/, next with export → out/, hugo → public/, jekyll → _site/, vite → dist/, or plain html in a folder). confirm the directory exists and contains at least one .html file. if the build hasn't run yet, instruct the user to run their build first (e.g. npm run build, astro build, next export). do not audit source files. output: confirmed path to build directory (e.g. /path/to/dist).

step 2: run the on-disk audit

input: build directory path from step 1. action: execute the on-disk auditor script:

node scripts/audit-seo.mjs --dir <build-dir>

optionally add flags:

--strict (turns warnings into errors, ci mode)
--json (machine-readable output to stdout)
--max-page-kb 500 (override page weight limit)
--max-img-kb 500 (override image file-size limit)

read the output: ✗ = error (genuinely hurts ranking, breaks crawlers, or breaks core web vitals , fix these first), ⚠ = warn (best-practice miss, lower priority). the heuristic score is a rough dial, not an official lighthouse number. output: audit report listing all 12 gates (h1, viewport, landmarks, title/desc, canonical, og, images, csp-safe scripts, no external refs, page weight, json-ld, url hygiene) with pass/fail status and line numbers where issues occur.

step 3: (if the site is deployed) run the live-url geo audit

input: live origin url (e.g. https://www.example.com). action: execute the live auditor:

node scripts/audit-live.mjs https://www.example.com

this checks robots.txt (ai-crawler allow/deny policies), sitemap.xml, llms.txt, homepage json-ld (@graph-aware), canonical, hsts/vary/cache-control headers. output: live audit report showing which ai crawlers are allowed/blocked, whether llms.txt exists and is valid, what json-ld @types the homepage ships, and header health.

step 4: group findings by severity

input: all audit output from steps 2 and 3. action: list errors first (highest priority), then high-value warns (e.g. missing structured data ranks higher than font loading issues), then lower-priority warns. for each issue, note the page(s) affected and the exact gate that failed. output: prioritized list of all issues.

step 5: apply fixes (one per issue, source-file edit only)

input: each issue from step 4 (in priority order). action: for each fix:

identify the gate from the auditor output (e.g. "missing h1" or "render-blocking css").
open the matching reference file (e.g. references/hard-gates.md for gate details, references/structured-data.md for json-ld, references/geo-ai-visibility.md for robots/llms.txt, references/lcp-playbook.md for performance).
copy the exact target value or reference implementation.
edit the source template, layout, or config (not the built html , it is regenerated). examples: src/layouts/base.astro, src/components/layout.tsx, next.config.js, _config.yml, robots.txt, public/llms.txt.
rebuild the project (e.g. npm run build, astro build).
re-run the on-disk audit to confirm the issue is resolved.

repeat until all errors and high-value warns are fixed. output: modified source files, rebuilt output directory, clean audit report.

step 6: (if using ci) integrate the auditor into your build pipeline

input: project build config (github actions, gitlab ci, vercel, netlify, etc.). action: add a build step that runs:

node scripts/audit-seo.mjs --dir <build-dir> --strict

this turns all warnings into errors, so regressions fail the pipeline before deploy. output: ci job that enforces seo gates on every build.

step 7: (if deploying) re-run the live audit post-deploy

input: live origin url. action: after the site is live, run step 3 again to confirm robots/llms.txt, json-ld, and headers are correct on the live origin. output: confirmation that live audit passes.

decision points

if the build directory does not exist or is empty:

do not create it or audit the source directory.
ask the user to run their build script first (e.g. npm run build). then retry with the correct path.

if the on-disk audit reports errors:

prioritize fixing errors over warns. errors directly hurt ranking, crawler access, or core web vitals.
do not skip any error to move to the next page. fix the error on every page it occurs, then re-run the audit.

if the on-disk audit passes but the user says "my site is slow" or "my pagespeed is bad":

the 12 gates are structural seo checks, not performance measurements. the site may pass all gates but still be slow.
open references/lcp-playbook.md. diagnose first: find the actual lcp element (often text, not an image) using devtools applied throttling (not lantern/simulated). then work the levers in impact order: render-blocking css → critical-css split → font preload discipline → defer non-lcp viewport dom → eager hero image → third-party js on idle/interaction → intersectionobserver-deferred init.
keep cls at 0 by reserving space for everything you defer. lock wins in with a lighthouse ci gate + per-page js byte budget.

if the live audit shows ai crawlers are blocked by robots.txt:

open references/geo-ai-visibility.md. add explicit allow rules for major ai crawlers (exact user-agents: GPTbot, Claude-Web, Perplexitybot, Googlebot-Extended, etc.). also ship llms.txt at your origin root with the exact format from the reference.
re-deploy and re-run the live audit.

if json-ld is missing or incomplete:

open references/structured-data.md. copy the matching recipe (organization, website, breadcrumb, article, etc.). use the nested @graph pattern if you have multiple types on one page.
add the json-ld to your base layout (e.g. <head> in base.astro or _document.tsx). rebuild and re-run the audit.

if a page is 404/50x (error page):

error pages are exempt from canonical, meta description, og, and json-ld checks (they are noindex by design). the auditor already skips them.
do not relax any threshold to make an error page pass. audit only success pages (2xx, 3xx).

if the auditor output is unclear or contradicts lighthouse:

the auditor uses conservative regex extraction, not a full dom , it is for audit signals. a clean run is strong evidence, not a formal guarantee.
lighthouse is an official google tool. if lighthouse disagrees with the auditor, trust lighthouse for ranking impact. the auditor is a gate tool, not an oracle.
the auditor's heuristic score is a rough dial, not a lighthouse score. do not compare them directly.

if you need to relax a threshold (e.g. page weight >500 KB):

do not relax the threshold to make the audit pass. fix the page instead.
if the threshold is genuinely inappropriate for your use case, edit audit-seo.mjs to raise the limit before running the audit, then document the change.

output contract

on-disk audit (audit-seo.mjs):

standard output (plaintext): all 12 gates listed, each with pass/fail status, affected page paths, line numbers, and a heuristic seo score (0-100 dial).
if --json flag is set: json object with pages array (each page lists gate results), errors count, warnings count, score.
exit code 0 on pass, 1 on any error (or any warning if --strict).

live audit (audit-live.mjs):

standard output (plaintext): robots.txt ai-crawler allow/deny status, sitemap.xml presence, llms.txt presence/validity, homepage json-ld @types, canonical url, hsts/vary/cache-control headers.
if --json flag is set: json object with robots, sitemap, llms_txt, json_ld, canonical, headers.
exit code 0 on pass, 1 on any issue.

all output:

goes to stdout by default. user may redirect to a file (e.g. > report.txt or > report.json).
nothing is written to the project directory, skill directory, or home directory automatically.

outcome signal

the audit passes when:

on-disk audit reports zero errors and (optionally) zero warns, with a heuristic score ≥80.
user has rebuilt the project and confirmed no new errors are introduced.

the live audit passes when:

robots.txt explicitly allows major ai crawlers (gptbot, claude-web, perplexitybot, googlebot-extended).
llms.txt exists at the origin root and matches the format in references/geo-ai-visibility.md.
homepage ships json-ld with at least organization + website types.
canonical url is absolute and matches the deploy origin.
hsts, vary, cache-control headers are present and appropriate.

the lcp playbook is applied when:

the site's mobile lcp (measured with devtools applied throttling) drops to ≤2.5s (good core web vitals threshold).
render-blocking css is eliminated or split into critical/non-critical.
hero images use fetchpriority="high" and are in the http/2 server push or preload queue.
third-party js (analytics, ads, chat) is deferred or loaded on idle/interaction.
cls is locked at 0 with size reservations.
a lighthouse ci gate prevents regressions.

user knows the skill worked when:

the audit report shows all gates passing (or only low-priority warns remaining).
the user has deployed the fixed site and re-run the live audit to confirm robots/llms.txt/json-ld are live.
(if applicable) the site's pagespeed or lighthouse seo score has improved, and/or the user can confirm the site is now discoverable by ai agents (e.g., chatgpt can cite the site).
if lcp was the complaint: devtools applied throttling shows mobile lcp ≤2.5s, and the lighthouse ci gate is in the pipeline.

credits: original author zeze. distilled from a production site's build-time seo gates + worker seo layer. source: https://github.com/Cosmofang/seo-audit. version 1.1.0, updated 2026-06-10.

Seo Audit

related skills

intent

inputs

procedure

step 1: confirm the build directory exists and is built

step 2: run the on-disk audit

step 3: (if the site is deployed) run the live-url geo audit

step 4: group findings by severity

step 5: apply fixes (one per issue, source-file edit only)

step 6: (if using ci) integrate the auditor into your build pipeline

step 7: (if deploying) re-run the live audit post-deploy

decision points

output contract

outcome signal