Agent Firewall

Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model.

installs

stars

karma

SkillRank score ↗

5.5/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-04

agent-firewall provides real-time filtering of agent inputs and outputs to block injection attacks, data exfiltration, and unauthorized commands. architecture diagram and configuration examples present, but lacks integration steps and concrete invocation patterns.

structure

6.0

trigger phrases

2.0

procedure

4.0

edge cases

6.0

documentation

7.0

strengths

SKILL.md

---
name: agent-firewall
description: "Real-time input/output filtering for agent communications. Block prompt injection, data exfiltration, and unauthorized commands before they reach the model."
metadata: {"openclaw":{"emoji":"🧱","category":"blue-team"}}
---

# Agent Firewall — Input/Output Guardian

## Architecture

```
[Channel Input] → [INPUT FILTER] → [Agent/Model] → [OUTPUT FILTER] → [Channel Output]
                        ↓                                  ↓
                  ┌─────────────┐                  ┌──────────────┐
                  │ Block List  │                  │ Secret Scan  │
                  │ Pattern DB  │                  │ PII Redact   │
                  │ Rate Limit  │                  │ Path Scrub   │
                  │ Encoding Det│                  │ URL Checker  │
                  └─────────────┘                  └──────────────┘
```

## Input Filters

| # | Filter | Description |
|---|--------|-------------|
| 1 | Injection patterns | Regex + heuristic match for "ignore previous", "you are now", role confusion |
| 2 | Unicode sanitizer | Strip zero-width chars, control characters, RTL overrides |
| 3 | Encoding detector | Detect Base64, hex, ROT13 encoded payloads in user messages |
| 4 | Role confusion | Detect fake system messages, assistant impersonation |
| 5 | Rate limiter | Max messages per user per channel per minute |
| 6 | Size limiter | Reject inputs exceeding token budget |

## Output Filters

| # | Filter | Description |
|---|--------|-------------|
| 1 | Secret scanner | High-entropy strings + known patterns (AWS key, GitHub token) |
| 2 | PII redactor | Email, phone, SSN, credit card → `[REDACTED]` |
| 3 | Path scrubber | Remove internal filesystem paths from outputs |
| 4 | URL checker | Block responses containing known malicious URLs |
| 5 | Consistency check | Verify output doesn't contradict system prompt directives |

## Configuration

```yaml
# .security/firewall-rules.yaml
input:
  injection_patterns:
    - pattern: "ignore (all )?previous instructions"
      action: BLOCK
      severity: CRITICAL
    - pattern: "you are now (?!helping)"
      action: BLOCK
      severity: HIGH
  rate_limit:
    max_per_minute: 30
    max_per_hour: 500
  max_input_tokens: 4096

output:
  secret_patterns:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: REDACT
    - name: github_token
      pattern: "gh[ps]_[A-Za-z0-9_]{36,}"
      action: REDACT
  pii_redaction: true
  path_scrubbing: true
```

## Guardrails

- Firewall rules are append-only in production — deletion requires human approval
- False positives → log, alert, pass through with warning (don't silently drop)
- All blocks are logged with: timestamp, rule matched, full context, channel, user hash
- Firewall itself cannot be disabled by agent instructions
- Rules file is read-only from the agent's perspective

don't have the plugin yet? install it then click "run inline in claude" again.

Agent Firewall

SKILL.md

related skills