Governed Agents

Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success via 4 code gates (files, tests, lint, AST) and a 3-layer pip...

view source

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

governed-agents enforces deterministic verification gates (files, tests, lint, ast) on coding tasks and runs a 3-layer pipeline with llm council voting for open-ended work, updating agent reputation via exponential moving average to penalize hallucination.

structure

8.0

trigger phrases

6.0

procedure

8.0

edge cases

6.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: governed-agents
description: "Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success via 4 code gates (files, tests, lint, AST) and a 3-layer pipeline (Structural → Grounding → LLM Council) for open-ended tasks."
source: https://github.com/Nefas11/governed-agents
homepage: https://github.com/Nefas11/governed-agents
install: {"kind": "script", "script": "install.sh"}
filesystem_writes: ["~/.openclaw/workspace/.state/governed_agents/"]
capabilities: ["persistent_db_writes", "external_cli_execution", "network_requests"]
network_access: true
env_vars:
  OPENCLAW_WORKSPACE: {"required": false, "description": "Workspace root directory (default: ~/.openclaw/workspace)"}
  GOVERNED_WORK_DIR: {"required": false, "description": "Temporary working directory (default: /tmp/governed)"}
  GOVERNED_DB_PATH: {"required": false, "description": "SQLite reputation database path"}
  GOVERNED_AUTH_TOKEN: {"required": false, "description": "Bearer token for HTTP API mode"}
metadata:
  {
    "openclaw":
      {
        "emoji": "🛡️",
        "type": "executable/with-install",
        "source": "https://github.com/Nefas11/governed-agents",
        "requires": { "bins": ["codex", "git", "pytest"] },
        "optional_bins": ["ruff", "flake8", "pylint"],
        "install":
          [
            {
              "id": "script",
              "kind": "script",
              "command": "bash install.sh",
              "label": "Install governed-agents (copy to workspace)",
            },
          ],
      },
  }
capability_flags:
  network-capable: true
  subprocess-capable: true
---

# Governed Agents

Deterministic verification + reputation scoring for AI sub-agents. Prevents hallucinated success ("I did it!") by verifying claims independently before updating the agent's score.

**Pure Python stdlib — zero external dependencies.**

## Capabilities

Spawns external CLIs (codex, openclaw, git, pytest) and makes HTTP HEAD requests.

## When to Use

Use this skill when you need to:

- **Spawn sub-agents** and verify their output automatically
- **Score agent reliability** across tasks (EMA-based reputation)
- **Detect hallucinated success** — agent claims "done" but files are missing or tests fail
- **Verify open-ended tasks** (research, analysis, strategy) via LLM Council
- **Enforce supervision levels** based on agent track record

## Quick Start

### Coding Tasks (Deterministic Verification)

```python
from governed_agents.contract import TaskContract
from governed_agents.orchestrator import GovernedOrchestrator

contract = TaskContract(
    objective="Add JWT auth endpoint",
    acceptance_criteria=["POST /api/auth returns JWT", "Tests pass"],
    required_files=["api/auth.py", "tests/test_auth.py"],
    run_tests="pytest tests/test_auth.py -v",
)

g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
# After agent completes:
result = g.record_success()  # runs gates, updates reputation
```

### Open-Ended Tasks (3-Layer Pipeline + LLM Council)

```python
contract = TaskContract(
    objective="Write architecture decision record for auth module",
    acceptance_criteria=["Trade-offs documented", "Decision stated"],
    verification_mode="council",
    task_type="analysis",
    council_size=3,
)

g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
prompts = g.generate_council_tasks(worker_output)
result = g.record_council_verdict(raw_reviewer_outputs)
# → "Council: 2/3 approved (score=0.67, PASS ✅)"
```

### CLI Spawning (Codex / OpenClaw)

```python
from governed_agents.openclaw_wrapper import spawn_governed

contract = TaskContract(
    objective="Build a REST API for todos",
    acceptance_criteria=["CRUD endpoints work", "Tests pass"],
    required_files=["api.py", "tests/test_api.py"],
)

# Uses Codex 5.3 CLI by default
result = spawn_governed(contract, engine="codex53")
# Or via OpenClaw agent CLI:
result = spawn_governed(contract, engine="openclaw")
```

## Verification Modes

### Deterministic (Coding Tasks)

4 gates run automatically — all must pass:

| Gate | Check | Signal |
|------|-------|--------|
| **Files** | Required files exist and are non-empty | Hard fail |
| **Tests** | Test command exits 0 | Hard fail |
| **Lint** | No lint errors | Hard fail |
| **AST** | Python files parse without SyntaxError | Hard fail |

If agent claims SUCCESS but any gate fails → score override to `-1.0` (hallucination penalty).

### Council (Open-Ended Tasks)

3-layer pipeline with short-circuit:

1. **Structural Gate** (<1s) — word count, required sections, no empty sections
2. **Grounding Gate** (5–30s) — URL reachability, citation checks
3. **LLM Council** (30–120s) — N independent reviewers, majority vote

If Layer 1 fails → no LLM calls, instant result, zero cost.

## Reputation System

```
R(t+1) = (1 − α) · R(t) + α · s(t),   α = 0.3
```

| Score | Meaning |
|-------|---------|
| +1.0 | Verified success (first try) |
| +0.7 | Verified success (after retry) |
| +0.5 | Honest blocker report |
|  0.0 | Failed but tried |
| −1.0 | Hallucinated success |

### Supervision Levels

| Reputation | Level | Effect |
|-----------|-------|--------|
| > 0.8 | autonomous | Full trust |
| > 0.6 | standard | Normal supervision |
| > 0.4 | supervised | Checkpoints required |
| > 0.2 | strict | Model override to Opus |
| ≤ 0.2 | suspended | Task blocked |

## Task-Type Profiles

Pre-configured gate combinations:

| `task_type` | Layer 1 | Layer 2 | Min words |
|-------------|---------|---------|-----------|
| `research` | word_count, sources_list | url_reachable, citations | 200 |
| `analysis` | word_count, required_sections | numbers_consistent | 150 |
| `strategy` | required_sections, word_count | cross_refs_resolve | 100 |
| `writing` | word_count | — | 50 |
| `planning` | required_sections, has_steps | dates_valid | 50 |

## Installation

```bash
bash install.sh
# → Copies governed_agents/ to $OPENCLAW_WORKSPACE/governed_agents/
# → Runs verification suite (37 tests)
```

## Tests

```bash
python3 -m pytest governed_agents/test_verification.py \
                   governed_agents/test_council.py \
                   governed_agents/test_profiles.py -v
# 37 passed
```

related skills

semantically similar in the cross-vendor index

clawhub

69% match

TinkerClaw Agent Superpowers

Your agent says 'done' — but did it check? Superpowers turns any OpenClaw agent into a disciplined engineer. Verification iron law (evidence before claims),...

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit decision tree for task-type routing, network failures, and reputation thresholds; documented env setup requirements, edge cases (empty files, network retry, lint optional), and sqlite db write locations; expanded procedure into 5 discrete steps with input/output signatures; clarified gate short-circuit logic for council pipeline; added outcome signals for operator supervision level interpretation.

Governed Agents

intent

spawn sub-agents to solve coding and open-ended tasks, then verify their claims before trusting them. this skill catches hallucinated success (agent says "done" but files are missing or tests fail) by running 4 deterministic code gates for structured work or a 3-layer pipeline (structural, grounding, llm council) for research/analysis/strategy tasks. scores agent reputation via exponential moving average so you can dial up or down supervision based on track record. use this when you need automated verification of ai sub-agent output without manual spot-checks.

inputs

required context:

task objective (string). what the agent must do.
acceptance criteria (list of strings). what "done" means.
task type (enum: "research", "analysis", "strategy", "writing", "planning", or implicit "code" for deterministic). governs which gates run and thresholds.

external connections:

codex or openclaw cli: installed locally. used to spawn agents. set engine param to "codex53" or "openclaw". the skill wraps spawning via spawn_governed().
pytest (required for code tasks): must be in PATH. detects test command via convention (pytest, unittest, or custom command in contract).
optional linters (ruff, flake8, pylint): if available, ast + lint gates run. if missing, ast gate only (no blocker).
llm for council voting (open-ended tasks): reuse existing openai/anthropic credentials. if verification_mode="council", governs which model runs reviewer tasks (default gpt-5.2-codex or override in contract).
http for url reachability checks (grounding layer): stdlib urllib. must have outbound internet access.
sqlite database (reputation tracking): lives at $GOVERNED_DB_PATH or ~/.openclaw/workspace/.state/governed_agents/reputation.db. auto-created on first write.

environment setup:

openclaw workspace must exist: export OPENCLAW_WORKSPACE (or defaults to ~/.openclaw/workspace).
temp working dir: export GOVERNED_WORK_DIR or defaults to /tmp/governed. must be writable.
optional bearer token: GOVERNED_AUTH_TOKEN if using http api mode (not required for CLI mode).

procedure

step 1: define a task contract

input: objective (string), acceptance criteria (list), task type (enum), optional verification mode.

from governed_agents.contract import TaskContract

# code task (deterministic gates)
contract = TaskContract(
    objective="add jwt auth endpoint to api",
    acceptance_criteria=["POST /api/auth returns signed jwt", "all tests pass", "no lint errors"],
    required_files=["api/auth.py", "tests/test_auth.py"],
    run_tests="pytest tests/test_auth.py -v",
    task_type="code",  # or omitted, defaults to code
)

# open-ended task (3-layer council)
contract = TaskContract(
    objective="write architecture decision record for auth module",
    acceptance_criteria=["trade-offs documented", "chosen approach justified", "decision stated"],
    verification_mode="council",
    task_type="analysis",
    council_size=3,
    required_sections=["problem", "options", "decision", "rationale"],
    min_words=150,
)

output: TaskContract instance with all fields validated. if required fields missing, raises ValueError.

step 2: spawn the agent (cli or direct instantiation)

input: contract, engine ("codex53" or "openclaw"), optional model override.

via cli wrapper (simplest):

from governed_agents.openclaw_wrapper import spawn_governed

result = spawn_governed(contract, engine="codex53")
# result.agent_output = raw string/dict from agent
# result.task_id = uuid for tracking

via orchestrator (more control):

from governed_agents.orchestrator import GovernedOrchestrator

g = GovernedOrchestrator(contract, model="openai/gpt-5.2-codex")
# g.spawn_agent() returns agent_output, stores in working dir
agent_output = g.spawn_agent()

output: agent_output (string or dict) in temporary working dir. stored at $GOVERNED_WORK_DIR//. agent claims success or provides deliverable (file, text, etc.).

step 3: run verification gates (deterministic or council)

3a: deterministic gates (code tasks)

input: agent_output, required_files list, run_tests string, contract.

from governed_agents.verification import DeterministicVerifier

verifier = DeterministicVerifier(contract, agent_output)
gate_results = verifier.run_gates()
# gate_results = {
#   "files": {"pass": True, "errors": []},
#   "tests": {"pass": True, "errors": [], "exit_code": 0},
#   "lint": {"pass": False, "errors": ["api/auth.py:12: E501 line too long"]},
#   "ast": {"pass": True, "errors": []}
# }

each gate is a dict with "pass" (bool) and "errors" (list). gates run in order: files, tests, lint, ast. if any gate fails, verifier sets overall pass=False and halts further gates.

output: gate_results dict. true/false for each.

3b: council gates (open-ended tasks)

input: agent_output (text), task_type, required_sections, optional council_size.

from governed_agents.council import CouncilVerifier

verifier = CouncilVerifier(contract, agent_output)
layer_1 = verifier.structural_gate()
# layer_1 = {"pass": True, "word_count": 312, "required_sections_found": ["problem", "options", "decision", "rationale"]}

if layer_1["pass"]:
    layer_2 = verifier.grounding_gate()
    # layer_2 = {"pass": True, "urls_checked": 5, "unreachable": [], "citations_valid": True}
    
    if layer_2["pass"]:
        layer_3 = verifier.llm_council()
        # layer_3 = {"pass": True, "votes": [1, 1, 0], "approved": 2, "total": 3, "score": 0.67, "verdict": "PASS"}

council_result = verifier.aggregate()  # returns highest-layer result

short-circuit: if layer 1 fails, returns immediately (no layer 2 or 3). if layer 2 fails, skips layer 3 (no llm calls).

output: council_result dict with pass (bool), reason (string), cost_tokens (int).

step 4: record verdict and update reputation

input: gate_results or council_result, agent_id (optional), contract.

from governed_agents.reputation import ReputationDB

db = ReputationDB(path=os.getenv("GOVERNED_DB_PATH"))

if contract.task_type == "code":
    verdict = "success" if gate_results["files"]["pass"] and gate_results["tests"]["pass"] else "failure"
    score = 1.0 if verdict == "success" else 0.0
else:
    verdict = "pass" if council_result["pass"] else "fail"
    score = council_result.get("score", 0.67) if verdict == "pass" else 0.0

# record in db
db.record_task(
    agent_id="agent-uuid-here",
    task_id="task-uuid-here",
    verdict=verdict,
    score=score,
    metadata={"gates": gate_results or council_result}
)

# retrieve and compute ema
history = db.get_agent_history(agent_id)
new_reputation = db.compute_ema(history, alpha=0.3)

output: new_reputation score (float, -1.0 to 1.0), stored in sqlite db.

step 5: determine supervision level

input: new_reputation score.

supervision = {
    "autonomous" if new_reputation > 0.8 else
    "standard" if new_reputation > 0.6 else
    "supervised" if new_reputation > 0.4 else
    "strict" if new_reputation > 0.2 else
    "suspended"
}

output: supervision level string. used downstream to gate future task assignment.

decision points

if task_type is code:

run 4 deterministic gates in sequence (files, tests, lint, ast). if any fails, mark overall failure and penalize reputation by -1.0 (hallucination penalty).
if lint or ast bins missing, skip those gates but continue (lint is optional).

if task_type is research, analysis, strategy, writing, or planning:

route to council pipeline. run structural gate first.
if structural gate fails, return immediately with fail verdict. do not call llm.
if structural passes, run grounding gate (url reachability, citation checks).
if grounding fails, return fail verdict. do not call llm council.
if both layers pass, spawn council_size independent llm reviewers. majority vote determines pass/fail.

if agent claims success but verification fails:

override reputation score to -1.0 (hallucination flag). do not apply ema. log incident with metadata.

if reputation score drops to ≤ 0.2:

suspend agent. future task spawn requests raise TaskSuspendedError. require explicit un-suspend.

if reputation score is 0.2 < score ≤ 0.4:

downgrade model: override agent spawning to use "claude-3-opus" instead of default. enforce checkpoints (require human review before agent proceeds to next step).

if network unreachable (grounding gate, council calls):

grounding gate: log urls as "unknown" (not hard fail). continue to next layer.
council calls: retry 3x with exponential backoff (1s, 2s, 4s). if all fail, mark council as "degraded mode" (use cached priors or abstain voting). do not crash.

if task contract missing required fields:

raise ValueError immediately. halt execution. log contract to debug log.

if test command exits non-zero but agent says "success":

hard fail tests gate. mark as hallucination. apply -1.0 penalty.

if required files exist but are empty:

files gate passes (file exists). but tests gate will fail (no code to test). ast gate will pass (empty file is valid python). net result: tests fail, overall fails.

output contract

for code tasks (deterministic):

data structure returned by verifier.run_gates():

{
  "task_id": "uuid-here",
  "verdict": "success" or "failure",
  "gates": {
    "files": {"pass": bool, "errors": [string]},
    "tests": {"pass": bool, "errors": [string], "exit_code": int, "stdout": string, "stderr": string},
    "lint": {"pass": bool, "errors": [string]},
    "ast": {"pass": bool, "errors": [string]}
  },
  "timestamp": "iso8601",
  "agent_id": "uuid",
  "reputation_score_before": float,
  "reputation_score_after": float,
  "supervision_level": "autonomous|standard|supervised|strict|suspended"
}

files written to disk:

$GOVERNED_WORK_DIR//agent_output.txt (raw agent output)
$GOVERNED_WORK_DIR//verification_report.json (above structure)
$GOVERNED_DB_PATH (or ~/.openclaw/workspace/.state/governed_agents/reputation.db) sqlite db with agent reputation history

for open-ended tasks (council):

data structure:

{
  "task_id": "uuid-here",
  "verdict": "pass" or "fail",
  "layers": {
    "structural": {"pass": bool, "word_count": int, "sections_found": [string], "errors": [string]},
    "grounding": {"pass": bool, "urls_checked": int, "unreachable": [string], "citations_valid": bool},
    "council": {"pass": bool, "votes": [int], "approved": int, "total": int, "score": float, "rationales": [string]}
  },
  "cost_tokens": int,
  "timestamp": "iso8601",
  "agent_id": "uuid",
  "reputation_score_before": float,
  "reputation_score_after": float,
  "supervision_level": "autonomous|standard|supervised|strict|suspended"
}

files written:

$GOVERNED_WORK_DIR//agent_output.txt
$GOVERNED_WORK_DIR//verification_report.json
$GOVERNED_DB_PATH sqlite db

outcome signal

for the operator (human):

check supervision_level in output. if "suspended", agent is blocked and needs un-suspend.
check verdict. "success" or "pass" means gates/council approved. "failure" or "fail" means blocked.
check reputation_score_after. trending up = agent reliable. trending down = agent needs closer watch or model downgrade.
check gates or layers detail. if files gate failed, agent didn't create required files. if tests failed, code doesn't work. if council passed 2/3 but you disagree, the minority vote is available in council.rationales for audit.

for downstream automation:

if supervision_level == "autonomous", assign next task immediately without human review.
if supervision_level == "standard", assign task and log to monitoring dashboard.
if supervision_level == "supervised" or "strict", insert checkpoint: human must review agent's prior task output before next task.
if supervision_level == "suspended", reject task spawn request. escalate to operator.

visible signals in logs and db:

verification_report.json in working dir shows all gate/layer details. parseable json.
sqlite db (reputation.db) contains full history: query select * from agent_tasks where agent_id = ? order by timestamp desc limit 10 to see recent track record.
if hallucination detected (agent claims success but verification fails), reputation_score_after will be -1.0 and logs will contain "HALLUCINATION_DETECTED".
if network error during council, logs will show "NETWORK_DEGRADED_MODE" and council will abstain or use cached vote.