Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success via 4 code gates (files, tests, lint, AST) and a 3-layer pip...
---
name: governed-agents
description: "Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success via 4 code gates (files, tests, lint, AST) and a 3-layer pipeline (Structural → Grounding → LLM Council) for open-ended tasks."
source: https://github.com/Nefas11/governed-agents
homepage: https://github.com/Nefas11/governed-agents
install: {"kind": "script", "script": "install.sh"}
filesystem_writes: ["~/.openclaw/workspace/.state/governed_agents/"]
capabilities: ["persistent_db_writes", "external_cli_execution", "network_requests"]
network_access: true
env_vars:
OPENCLAW_WORKSPACE: {"required": false, "description": "Workspace root directory (default: ~/.openclaw/workspace)"}
GOVERNED_WORK_DIR: {"required": false, "description": "Temporary working directory (default: /tmp/governed)"}
GOVERNED_DB_PATH: {"required": false, "description": "SQLite reputation database path"}
GOVERNED_AUTH_TOKEN: {"required": false, "description": "Bearer token for HTTP API mode"}
metadata:
{
"openclaw":
{
"emoji": "🛡️",
"type": "executable/with-install",
"source": "https://github.com/Nefas11/governed-agents",
"requires": { "bins": ["codex", "git", "pytest"] },
"optional_bins": ["ruff", "flake8", "pylint"],
"install":
[
{
"id": "script",
"kind": "script",
"command": "bash install.sh",
"label": "Install governed-agents (copy to workspace)",
},
],
},
}
capability_flags:
network-capable: true
subprocess-capable: true
---
# Governed Agents
Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success ("I did it!") by verifying claims independently before updating the agent's score.
**Pure Python stdlib — zero external dependencies.**
## Capabilities
Spawns external CLIs (codex, openclaw, git, pytest) and makes HTTP HEAD requests.
## When to Use
Use this skill when you need to:
- **Spawn sub-agents** and verify their output automatically
- **Score agent reliability** across tasks (EMA-based reputation)
- **Detect hallucinated success** — agent claims "done" but files are missing or tests fail
- **Verify open-ended tasks** (research, analysis, strategy) via LLM Council
- **Enforce supervision levels** based on agent track record
## Quick Start
### Coding Tasks (Deterministic Verification)
```python
from governed_agents.contract import TaskContract
from governed_agents.orchestrator import GovernedOrchestrator
contract = TaskContract(
objective="Add JWT auth endpoint",
acceptance_criteria=["POST /api/auth returns JWT", "Tests pass"],
required_files=["api/auth.py", "tests/test_auth.py"],
run_tests="pytest tests/test_auth.py -v",
)
g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
# After agent completes:
result = g.record_success() # runs gates, updates reputation
```
### Open-Ended Tasks (3-Layer Pipeline + LLM Council)
```python
contract = TaskContract(
objective="Write architecture decision record for auth module",
acceptance_criteria=["Trade-offs documented", "Decision stated"],
verification_mode="council",
task_type="analysis",
council_size=3,
)
g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
prompts = g.generate_council_tasks(worker_output)
result = g.record_council_verdict(raw_reviewer_outputs)
# → "Council: 2/3 approved (score=0.67, PASS ✅)"
```
### CLI Spawning (Codex / OpenClaw)
```python
from governed_agents.openclaw_wrapper import spawn_governed
contract = TaskContract(
objective="Build a REST API for todos",
acceptance_criteria=["CRUD endpoints work", "Tests pass"],
required_files=["api.py", "tests/test_api.py"],
)
# Uses Codex 5.3 CLI by default
result = spawn_governed(contract, engine="codex53")
# Or via OpenClaw agent CLI:
result = spawn_governed(contract, engine="openclaw")
```
## Verification Modes
### Deterministic (Coding Tasks)
4 gates run automatically — all must pass:
| Gate | Check | Signal |
|------|-------|--------|
| **Files** | Required files exist and are non-empty | Hard fail |
| **Tests** | Test command exits 0 | Hard fail |
| **Lint** | No lint errors | Hard fail |
| **AST** | Python files parse without SyntaxError | Hard fail |
If agent claims SUCCESS but any gate fails → score override to `-1.0` (hallucination penalty).
### Council (Open-Ended Tasks)
3-layer pipeline with short-circuit:
1. **Structural Gate** (<1s) — word count, required sections, no empty sections
2. **Grounding Gate** (5–30s) — URL reachability, citation checks
3. **LLM Council** (30–120s) — N independent reviewers, majority vote
If Layer 1 fails → no LLM calls, instant result, zero cost.
## Reputation System
```
R(t+1) = (1 − α) · R(t) + α · s(t), α = 0.3
```
| Score | Meaning |
|-------|---------|
| +1.0 | Verified success (first try) |
| +0.7 | Verified success (after retry) |
| +0.5 | Honest blocker report |
| 0.0 | Failed but tried |
| −1.0 | Hallucinated success |
### Supervision Levels
| Reputation | Level | Effect |
|-----------|-------|--------|
| > 0.8 | autonomous | Full trust |
| > 0.6 | standard | Normal supervision |
| > 0.4 | supervised | Checkpoints required |
| > 0.2 | strict | Model override to Opus |
| ≤ 0.2 | suspended | Task blocked |
## Task-Type Profiles
Pre-configured gate combinations:
| `task_type` | Layer 1 | Layer 2 | Min words |
|-------------|---------|---------|-----------|
| `research` | word_count, sources_list | url_reachable, citations | 200 |
| `analysis` | word_count, required_sections | numbers_consistent | 150 |
| `strategy` | required_sections, word_count | cross_refs_resolve | 100 |
| `writing` | word_count | — | 50 |
| `planning` | required_sections, has_steps | dates_valid | 50 |
## Installation
```bash
bash install.sh
# → Copies governed_agents/ to $OPENCLAW_WORKSPACE/governed_agents/
# → Runs verification suite (37 tests)
```
## Tests
```bash
python3 -m pytest governed_agents/test_verification.py \
governed_agents/test_council.py \
governed_agents/test_profiles.py -v
# 37 passed
```
don't have the plugin yet? install it then click "run inline in claude" again.
added explicit decision tree for task-type routing, network failures, and reputation thresholds; documented env setup requirements, edge cases (empty files, network retry, lint optional), and sqlite db write locations; expanded procedure into 5 discrete steps with input/output signatures; clarified gate short-circuit logic for council pipeline; added outcome signals for operator supervision level interpretation.
spawn sub-agents to solve coding and open-ended tasks, then verify their claims before trusting them. this skill catches hallucinated success (agent says "done" but files are missing or tests fail) by running 4 deterministic code gates for structured work or a 3-layer pipeline (structural, grounding, llm council) for research/analysis/strategy tasks. scores agent reputation via exponential moving average so you can dial up or down supervision based on track record. use this when you need automated verification of ai sub-agent output without manual spot-checks.
required context:
external connections:
spawn_governed().environment setup:
input: objective (string), acceptance criteria (list), task type (enum), optional verification mode.
from governed_agents.contract import TaskContract
# code task (deterministic gates)
contract = TaskContract(
objective="add jwt auth endpoint to api",
acceptance_criteria=["POST /api/auth returns signed jwt", "all tests pass", "no lint errors"],
required_files=["api/auth.py", "tests/test_auth.py"],
run_tests="pytest tests/test_auth.py -v",
task_type="code", # or omitted, defaults to code
)
# open-ended task (3-layer council)
contract = TaskContract(
objective="write architecture decision record for auth module",
acceptance_criteria=["trade-offs documented", "chosen approach justified", "decision stated"],
verification_mode="council",
task_type="analysis",
council_size=3,
required_sections=["problem", "options", "decision", "rationale"],
min_words=150,
)
output: TaskContract instance with all fields validated. if required fields missing, raises ValueError.
input: contract, engine ("codex53" or "openclaw"), optional model override.
via cli wrapper (simplest):
from governed_agents.openclaw_wrapper import spawn_governed
result = spawn_governed(contract, engine="codex53")
# result.agent_output = raw string/dict from agent
# result.task_id = uuid for tracking
via orchestrator (more control):
from governed_agents.orchestrator import GovernedOrchestrator
g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
# g.spawn_agent() returns agent_output, stores in working dir
agent_output = g.spawn_agent()
output: agent_output (string or dict) in temporary working dir. stored at $GOVERNED_WORK_DIR/
3a: deterministic gates (code tasks)
input: agent_output, required_files list, run_tests string, contract.
from governed_agents.verification import DeterministicVerifier
verifier = DeterministicVerifier(contract, agent_output)
gate_results = verifier.run_gates()
# gate_results = {
# "files": {"pass": True, "errors": []},
# "tests": {"pass": True, "errors": [], "exit_code": 0},
# "lint": {"pass": False, "errors": ["api/auth.py:12: E501 line too long"]},
# "ast": {"pass": True, "errors": []}
# }
each gate is a dict with "pass" (bool) and "errors" (list). gates run in order: files, tests, lint, ast. if any gate fails, verifier sets overall pass=False and halts further gates.
output: gate_results dict. true/false for each.
3b: council gates (open-ended tasks)
input: agent_output (text), task_type, required_sections, optional council_size.
from governed_agents.council import CouncilVerifier
verifier = CouncilVerifier(contract, agent_output)
layer_1 = verifier.structural_gate()
# layer_1 = {"pass": True, "word_count": 312, "required_sections_found": ["problem", "options", "decision", "rationale"]}
if layer_1["pass"]:
layer_2 = verifier.grounding_gate()
# layer_2 = {"pass": True, "urls_checked": 5, "unreachable": [], "citations_valid": True}
if layer_2["pass"]:
layer_3 = verifier.llm_council()
# layer_3 = {"pass": True, "votes": [1, 1, 0], "approved": 2, "total": 3, "score": 0.67, "verdict": "PASS"}
council_result = verifier.aggregate() # returns highest-layer result
short-circuit: if layer 1 fails, returns immediately (no layer 2 or 3). if layer 2 fails, skips layer 3 (no llm calls).
output: council_result dict with pass (bool), reason (string), cost_tokens (int).
input: gate_results or council_result, agent_id (optional), contract.
from governed_agents.reputation import ReputationDB
db = ReputationDB(path=os.getenv("GOVERNED_DB_PATH"))
if contract.task_type == "code":
verdict = "success" if gate_results["files"]["pass"] and gate_results["tests"]["pass"] else "failure"
score = 1.0 if verdict == "success" else 0.0
else:
verdict = "pass" if council_result["pass"] else "fail"
score = council_result.get("score", 0.67) if verdict == "pass" else 0.0
# record in db
db.record_task(
agent_id="agent-uuid-here",
task_id="task-uuid-here",
verdict=verdict,
score=score,
metadata={"gates": gate_results or council_result}
)
# retrieve and compute ema
history = db.get_agent_history(agent_id)
new_reputation = db.compute_ema(history, alpha=0.3)
output: new_reputation score (float, -1.0 to 1.0), stored in sqlite db.
input: new_reputation score.
supervision = {
"autonomous" if new_reputation > 0.8 else
"standard" if new_reputation > 0.6 else
"supervised" if new_reputation > 0.4 else
"strict" if new_reputation > 0.2 else
"suspended"
}
output: supervision level string. used downstream to gate future task assignment.
if task_type is code:
if task_type is research, analysis, strategy, writing, or planning:
if agent claims success but verification fails:
if reputation score drops to ≤ 0.2:
if reputation score is 0.2 < score ≤ 0.4:
if network unreachable (grounding gate, council calls):
if task contract missing required fields:
if test command exits non-zero but agent says "success":
if required files exist but are empty:
data structure returned by verifier.run_gates():
{
"task_id": "uuid-here",
"verdict": "success" or "failure",
"gates": {
"files": {"pass": bool, "errors": [string]},
"tests": {"pass": bool, "errors": [string], "exit_code": int, "stdout": string, "stderr": string},
"lint": {"pass": bool, "errors": [string]},
"ast": {"pass": bool, "errors": [string]}
},
"timestamp": "iso8601",
"agent_id": "uuid",
"reputation_score_before": float,
"reputation_score_after": float,
"supervision_level": "autonomous|standard|supervised|strict|suspended"
}
files written to disk:
data structure:
{
"task_id": "uuid-here",
"verdict": "pass" or "fail",
"layers": {
"structural": {"pass": bool, "word_count": int, "sections_found": [string], "errors": [string]},
"grounding": {"pass": bool, "urls_checked": int, "unreachable": [string], "citations_valid": bool},
"council": {"pass": bool, "votes": [int], "approved": int, "total": int, "score": float, "rationales": [string]}
},
"cost_tokens": int,
"timestamp": "iso8601",
"agent_id": "uuid",
"reputation_score_before": float,
"reputation_score_after": float,
"supervision_level": "autonomous|standard|supervised|strict|suspended"
}
files written:
for the operator (human):
for downstream automation:
visible signals in logs and db:
select * from agent_tasks where agent_id = ? order by timestamp desc limit 10 to see recent track record.