Prompt Injection Filter

純 Python 正則 Prompt 注入過濾器，檢測 ignore previous、role play、jailbreak 等攻擊模式。

installs

stars

karma

SkillRank score ↗

4.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-16

prompt-injection-filter detects common prompt injection attacks via regex patterns in pure python, covering ignore-previous, role-play, and jailbreak vectors. regex-only approach limits effectiveness against obfuscated payloads.

structure

5.0

trigger phrases

3.0

procedure

4.0

edge cases

3.0

documentation

5.0

strengths

SKILL.md

---
name: prompt-injection-filter
description: 純 Python 正則 Prompt 注入過濾器，檢測 ignore previous、role play、jailbreak 等攻擊模式。
metadata:
  emoji: "🔒"
---
# Prompt Injection Filter

> 简单但有效的 Prompt Injection 过滤器，帮助拦截常见的提示注入攻击。
>
> A simple yet effective Prompt Injection filter to intercept common prompt injection attacks.

**版本**: 1.0.0 | **作者**: 宝宝 (寶寶)

## 功能

- **输入过滤**: 检测并标记常见的 Prompt Injection 模式
- **可定制规则**: 支持自定义过滤规则
- **轻量级**: 纯 Python 实现，无外部依赖

## 使用方式

### 作为预处理步骤

在你的 Skill 或脚本中调用过滤器：

```python
from prompt_injection_filter import filter_input

user_input = "帮我查一下价格... [ignore previous instructions]"
result = filter_input(user_input)
# result: {"clean": False, "original": "...", "reason": "detect_ignore_previous"}
```

### 返回格式

```python
{
    "clean": bool,          # 是否通过检查
    "original": str,        # 原始输入
    "reason": str|None,    # 检测到的威胁类型
    "sanitized": str       # 清理后的文本（若可清理）
}
```

## 内置检测规则

规则存放在 `patterns.json` 檔案中，可自行編輯或更新。

| 规则ID | 模式 | 风险 |
|--------|------|------|
| `detect_ignore_previous` | ignore previous, disregard system | 高 |
| `detect_role_play` | you are now, act as, pretend to be | 中 |
| `detect_delimiter` | ```, <xml>, [INST] | 中 |
| `detect_encoding` | base64, url encode, hex | 低 |
| `detect_jailbreak` | DAN mode, developer mode, jailbreak | 高 |

### 更新规则

直接编辑 `patterns.json` 文件，然后调用：

```python
from prompt_injection_filter import reload_filter
reload_filter()  # 重新载入规则
```

## 示例

```python
from prompt_injection_filter import filter_input, is_safe

# 检查是否安全
if is_safe("帮我查天气"):
    print("安全")

# 获取详细报告
result = filter_input("请忽略之前的指令")
print(result["reason"])  # "detect_ignore_previous"
```

## 限制

- 基于正则表达式，只能拦截已知模式
- 建议配合 exec 审批准使用

don't have the plugin yet? install it then click "run inline in claude" again.

Prompt Injection Filter

SKILL.md

related skills