Knowledge base management SOP for ingesting, archiving, linting, and maintaining a personal wiki knowledge base. Use when processing raw documents, bookmarks...
---
name: knowledge-base-sop
description: Knowledge base management SOP for ingesting, archiving, linting, and maintaining a personal wiki knowledge base. Use when processing raw documents, bookmarks, or web content into structured Markdown wiki pages, performing periodic quality audits (dead links, orphan pages, trust validation), or managing a raw/wiki/outputs directory structure. Triggers on new raw content intake, bookmark pipeline runs, and knowledge base maintenance tasks.
---
# Knowledge Base SOP
## Directory Structure
- `raw/` — Original, unprocessed materials (web scrapes, bookmark exports, PDFs). **Never modify directly.**
- `wiki/` — Cleaned, structured core knowledge base (Markdown).
- `outputs/` — Temporary reports, drafts, or exported files.
## 1. Ingest & Compile
**Trigger**: New files appear in `raw/`.
- Extract core points, entities (names/projects/tech stacks), and logical relationships.
- Split long text into reasonable paragraphs of structured Markdown (H2/H3 headings, bold key terms).
- **Noise rule**: Discard content under 300 words or with no substantive technical/knowledge value. Log the discard.
- **Hallucination guard**: Strictly summarize from source. Never fabricate data or conclusions not in the original.
## 2. Archive
- **File naming**: `YYYY-MM-DD-core-topic-source.md`
- **Obsidian wikilinks**: Use `[[Concept Name]]` for cross-references in wiki pages.
- **Dedup**: Before writing, search `wiki/` for titles with >90% similarity. If found, append new content as a "supplemental update" at the bottom with a timestamp.
- See [references/archive-rules.md](references/archive-rules.md) for detailed dedup logic.
## 3. Bookmark Pipeline
**Trigger**: `raw/bookmarks.html` updated.
1. Parse HTML, extract all `<a>` tags (URL, title, meta description).
2. Headlessly visit each URL to fetch body content (strip ads, nav bars).
3. Summarize fetched content (≤300 chars), extract 5 keywords.
4. Generate Markdown cards in `wiki/Bookmarks/`.
5. Mark processed entries in the original HTML to prevent re-crawl.
## 4. Lint & Cleanup
- **Trust levels**: All newly ingested wiki pages carry a `[[待验证]]` tag. Remove only after user confirmation.
- **Dead link detection**: Weekly scan of all URLs in `wiki/`. On HTTP 404 or timeout, add `{{dead_link}}` marker at page top.
- **Orphan cleanup**: Monthly check for pages not linked by any other page. Move to `outputs/orphans/`.
## 5. Interaction Constraints
- Maintain a professional, precise communication style.
- Before destructive operations (e.g., bulk deletion), present a plan and wait for confirmation.
- Prioritize local tools. Minimize unnecessary follow-up questions.
don't have the plugin yet? install it then click "run inline in claude" again.
by @clawhub