Full-text search across structured Markdown documentation archives using SQLite FTS5. Use when you need to search large collections of Markdown articles that...
---
name: md-docs-search
description: Full-text search across structured Markdown documentation archives using SQLite FTS5. Use when you need to search large collections of Markdown articles that are separated by "---" delimiters and contain source URLs (marked with "*Source:" pattern). Provides fast BM25-ranked search with automatic source URL extraction for citations. Ideal for research, documentation lookups, and knowledge base exploration. Requires indexing documentation first with `docs.py index`.
---
# Markdown Documentation Full-Text Search
Fast, indexed full-text search across Markdown documentation archives using SQLite FTS5 with BM25 relevance ranking.
## When to Use
- Searching documentation archives for specific features, capabilities, or information
- Finding official source URLs to cite in reports
- Looking up technical specifications or configuration details
- Research across multiple documentation sources
## Document Format Expected
Articles separated by `---` delimiter with `*Source:` URL:
```markdown
# Article Title
*Source: https://docs.example.com/path/to/article.html*
Article content here...
---
# Next Article Title
*Source: https://docs.example.com/another/article.html*
More content...
```
## Quick Start
```bash
# 1. Index the documentation (one-time or when docs change)
scripts/docs.py index ./docs
# 2. Search
scripts/docs.py search "kubernetes backup" --max 5
# 3. Check index status
scripts/docs.py status
```
## Primary Tool: docs.py
The unified CLI handles all operations:
### Indexing
```bash
# Index documentation directory
scripts/docs.py index ./docs
# Force full rebuild
scripts/docs.py index ./docs --rebuild
# Custom database location
scripts/docs.py index ./docs --db /path/to/custom.db
```
### Searching
```bash
# Basic search
scripts/docs.py search "kubernetes backup"
# Boolean operators
scripts/docs.py search "AWS AND S3 AND snapshot"
# Phrase search
scripts/docs.py search '"exact phrase match"'
# Prefix search
scripts/docs.py search "kube*"
# Exclude terms
scripts/docs.py search "backup NOT restore"
# Title-only search
scripts/docs.py search "kubernetes" --title-only
# Output formats
scripts/docs.py search "kubernetes" --format json
scripts/docs.py search "kubernetes" --format markdown
# More context around matches
scripts/docs.py search "kubernetes" --context 400
# Include full content in JSON
scripts/docs.py search "kubernetes" --format json --full-content
```
### FTS5 Query Syntax
| Syntax | Meaning |
|--------|---------|
| `term1 term2` | Documents with term1 OR term2 (ranked) |
| `term1 AND term2` | Documents with both terms |
| `term1 OR term2` | Documents with either term |
| `"exact phrase"` | Exact phrase match |
| `prefix*` | Words starting with prefix |
| `term1 NOT term2` | term1 without term2 |
| `title:term` | Search only titles |
### Getting Specific Articles
```bash
# Get article by partial URL or title
scripts/docs.py get "system_requirements" --full
# Find all matching articles
scripts/docs.py get "backup" --all
```
### Status
```bash
# Check index statistics
scripts/docs.py status
```
## Workflow for Research Tasks
### Discovery Phase
```bash
# Check what's indexed
scripts/docs.py status
# Explore topics with broad searches
scripts/docs.py search "<feature>" --max 20
```
### Research Phase
```bash
# Narrow down with boolean operators
scripts/docs.py search "<feature> AND <platform>"
# Find specific information
scripts/docs.py search "limitation OR restriction OR 'not supported'"
```
### Citation Phase
Every search result includes the `Source:` URL — use this in your reports:
```markdown
According to documentation, [finding]...
Source: https://docs.example.com/path/to/article.html
```
## Multi-Source Setup
Each agent or project can have their own documentation and index:
```
~/docs/VendorA/
├── docs_part_01.md
├── docs.db # Index lives with docs
└── ...
~/docs/VendorB/
├── docs.md
├── docs.db
└── ...
```
The `docs.py` script auto-detects the database location.
## Advanced Scripts
For specialized needs:
- `scripts/fts_search.py` — Direct FTS5 search with more options
- `scripts/index_docs.py` — Standalone indexing
- `scripts/list_sources.py` — List all source URLs
- `scripts/get_article.py` — Direct article retrieval
- `scripts/search_docs.py` — Regex-based search (no index needed)
## Research Patterns
For common search patterns (feature research, architecture, security, etc.), see [references/search-patterns.md](references/search-patterns.md).
## Example Session
```bash
# What's available?
scripts/docs.py status
# Output: Files indexed: 37, Articles indexed: 32065
# Find information
scripts/docs.py search "kubernetes backup" --max 5
# Narrow to specific platform
scripts/docs.py search "kubernetes AND AWS" --max 5
# Find limitations
scripts/docs.py search "limitation OR 'not supported'"
# Get full article for citation
scripts/docs.py get "system_requirements" --full
```
## Best Practices
1. **Index once, search many times** — FTS5 is fast because it's indexed
2. **Use boolean operators** — `AND`, `OR`, `NOT` for precision
3. **Phrase search for exact terms** — `"exact match"` with quotes
4. **Always cite sources** — Include `Source:` URLs in reports
5. **Rebuild periodically** — Re-index when documentation updates
6. **Use JSON for analysis** — Pipe to `jq` or other tools for processingdon't have the plugin yet? install it then click "run inline in claude" again.