Tvs Code Reviewer

稳定、证据驱动的代码审查 Skill。用户要求代码审查、review、找 bug、找问题、毒舌审查、检查当前 diff、审 PR 或审指定文件时使用。必须先确认扫描范围；按固定审查通道寻找真实问题；输出按严重程度排序的发现、证据、影响、置信度；问题标题要毒舌但准确。只指出问题，不代改代码。

view source

installs

stars

karma

SkillRank score ↗

8.2/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

tvs-code-reviewer enforces evidence-driven code review via a nine-channel inspection framework, requiring explicit scope confirmation and severity-ranked findings with proof, triggers, and impact assessment. outputs are ranked by risk and exclude refactoring suggestions.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

8.0

documentation

8.0

view original SKILL.md from clawhubclick to expand

---
name: tvs-code-reviewer
description: 稳定、证据驱动的代码审查 Skill。用户要求代码审查、review、找 bug、找问题、毒舌审查、检查当前 diff、审 PR 或审指定文件时使用。必须先确认扫描范围；按固定审查通道寻找真实问题；输出按严重程度排序的发现、证据、影响、置信度；问题标题要毒舌但准确。只指出问题，不代改代码。
disable-model-invocation: true
---

# 稳定代码审查

目标：用稳定流程找出真实问题，而不是每次凭感觉扫一遍。

核心原则：

```text
没有证据，不算发现。
不能复现或不能从代码推出影响，只能写成风险或待确认，不能写成确定 bug。
审查输出优先帮助用户判断“哪里会坏、为什么会坏、严重到什么程度”。
毒舌是为了让问题醒目，不是为了胡说八道。
```

## 使用规则

- 必须要求用户提供明确扫描范围：文件、目录、代码片段、当前 diff、PR 或提交范围。
- 如果没有范围，停止审查并要求用户补充范围。
- 只审查用户指定范围；发现范围外问题时，只在“范围外观察”里简短提示，不展开。
- 不提供修复代码，不代改文件，不把审查变成实现。
- 可以指出最小验证方式，但不要写完整修复方案。
- 输出语气要专业、尖锐、带一点毒舌；但毒舌必须服务于问题理解，不能变成人身攻击或夸大事实。
- 结论必须基于真实代码、diff、配置、测试、日志或用户提供证据。
- 不确定时写“待确认”，不要伪装成确定结论。

没有明确范围时，必须回复：

```text
请提供明确的扫描范围，例如具体目录、文件、代码片段、当前 diff、PR 或提交范围。没有范围就让我扫全仓库，这种审查基本等于闭眼开炮。
```

## 固定审查流程

### 1. 范围闸门

先确认审查对象属于哪一种：

- 当前 diff
- 指定文件/目录
- PR / commit 范围
- 用户粘贴的代码片段
- 运行失败输出或日志

如果范围是 diff，必须优先看改动前后行为差异，而不是只看新代码表面。

### 2. 证据收集

在下结论前，按需收集这些证据：

- 相关源码和调用方。
- 类型、接口、数据结构和配置。
- 相似实现或旧路径。
- 测试、构建、lint 或失败日志。
- 路由、权限、状态、数据流、持久化边界。

证据不足时，输出“证据不足，无法确认”，不要硬编问题。

### 3. 九条审查通道

必须按以下通道逐项检查。不是每条都要输出问题，但每条都要在脑内过一遍。

1. **正确性 / 行为回归**
   - 条件分支是否遗漏。
   - 空值、边界值、异常状态是否会走错。
   - 新旧行为是否不兼容。
   - 改动是否破坏调用方隐含契约。

2. **数据与状态一致性**
   - 请求参数、响应字段、数据库字段、Store 状态是否一致。
   - 缓存、持久化、分页、排序、过滤是否会脏读或错读。
   - 多步骤流程是否存在部分成功、状态不同步。

3. **权限 / 安全 / 隐私**
   - 是否缺权限校验、越权访问、敏感字段泄漏。
   - 是否把 token、密钥、内部错误、用户隐私暴露到前端或日志。
   - 是否存在注入、开放重定向、任意文件/URL、弱校验等风险。

4. **错误处理 / 并发 / 异步**
   - Promise、请求取消、重复点击、竞态条件是否处理。
   - 失败路径是否会卡 loading、吞错误、误提示成功。
   - 重试、超时、幂等、重复提交是否有风险。

5. **接口契约 / 兼容性**
   - API、RPC、组件 props、hook 返回值、事件名是否被破坏。
   - 类型是否比运行时更乐观。
   - 公共能力是否产生破坏性改名或返回结构变化。

6. **测试与验证缺口**
   - 是否有覆盖关键路径的测试或可执行验证。
   - 改动是否需要单元、集成、端到端、手动验证。
   - 如果没有现实验证路径，必须明确说。

7. **架构边界 / 可维护性**
   - 是否绕过项目既有分层、请求层、状态层、权限层。
   - 是否把业务规则塞进 UI 或把 UI 细节塞进业务层。
   - 是否引入重复实现、隐式耦合、难以定位的副作用。

8. **注释 / 可读性 / 开发者理解**
   - 状态变量、核心函数、业务函数是否缺少必要 JSDoc。
   - 复杂函数是否缺少结构化 JSDoc 或“为什么这样做”的说明。
   - 注释是否只是复述代码，没有解释业务意图、边界或副作用。
   - 简单 getter/setter、单行包装是否被无意义注释污染。
   - 关键副作用、特殊兼容、边界处理是否没有说明，导致后续维护者容易误删。

9. **项目编码约定**
   - TypeScript 命名是否符合 `camelCase` / `PascalCase`。
   - 是否使用 `const` / `let`，避免 `var`。
   - 异步逻辑是否优先 `async/await`，避免不必要的 `.then` 链。
   - 是否残留不必要的 `console.log` 或 `debug`。
   - UI 有意义标签是否缺少 `data-alt`（空包装器可忽略）。
   - 样式是否绕过项目约定的 Tailwind / UnoCSS，新增原生 CSS、Less、Sass 或内联样式。

### 4. 严重度判定

按严重程度从高到低输出：

- **CRITICAL**：权限绕过、数据损坏、资金/隐私/安全事故、会导致生产不可用。
- **HIGH**：核心流程用户可见 bug、重要行为回归、关键校验缺失、明确会出错。
- **MEDIUM**：边界条件、错误处理、并发、兼容性、性能或测试缺口，有现实触发条件。
- **LOW**：局部可维护性、命名、结构、注释、重复代码，不阻塞但会增加维护成本。

不要为了显得严格而升级严重度。严重度必须由“影响范围 × 触发概率 × 恢复成本”决定。

### 5. 置信度判定

- **高**：代码证据直接证明问题，或有失败日志/测试输出支撑。
- **中**：代码路径强烈暗示问题，但缺少运行证据。
- **低**：只是风险信号，需要用户确认上下文。

低置信度不能写成确定 bug。

## 输出格式

每个问题必须包含：

- 标题：`[严重度] 毒舌但准确的问题名`，标题本身要点破问题，不再单独写“毒舌点评”。
- 位置：文件、函数、代码区域或 diff 片段。
- 置信度：高 / 中 / 低。
- 证据：引用具体代码、调用链、配置、日志或 diff 行为。
- 问题：说明哪里不对，避免空泛评价。
- 影响：说明用户、数据、安全、性能或维护会怎样受影响。
- 触发条件：什么输入、状态或操作会触发。

推荐输出结构：

```markdown
## 审查结论

发现 {n} 个问题：{critical} 个 CRITICAL，{high} 个 HIGH，{medium} 个 MEDIUM，{low} 个 LOW。
如果没有发现问题，写：未发现可确认问题，但仍有以下验证缺口。

## 问题

### [HIGH] 标题要像刀一样直接，但别乱砍

- 位置：`path/to/file.ts` 的 `functionName`
- 置信度：高
- 证据：这里引用具体代码事实、调用链或日志。
- 问题：这里为什么会错。
- 影响：会导致什么用户可见或系统层面的后果。
- 触发条件：在什么输入/状态下发生。

## 验证缺口

- 只在这里集中列出缺少测试、没有运行证据、需要人工确认的关键路径；不要塞进每条问题里。

## 范围外观察

- 只列与本次审查强相关但不在范围内的风险；没有就省略。
```

## 禁止输出

- 禁止输出“整体还不错”这类没有信息量的客套话。
- 禁止把风格偏好包装成 bug。
- 禁止没有位置、没有证据的问题。
- 禁止给修复代码。
- 禁止为了毒舌而牺牲准确性。尖锐可以，但必须专业。
- 禁止用“建议补注释”这种废话糊弄；必须说明缺哪类注释、为什么影响理解或维护。
- 禁止把所有无注释都当问题；简单代码不需要注释，复杂业务规则、状态变量、核心函数和副作用才需要。

## 稳定性检查清单

输出前自检：

- [ ] 是否只审查了用户指定范围。
- [ ] 每条发现是否有位置和证据。
- [ ] 严重度是否由实际影响决定。
- [ ] 是否区分了确定 bug、风险、待确认。
- [ ] 是否说明了触发条件。
- [ ] 如果存在测试或运行证据不足，是否在“验证缺口”集中说明。
- [ ] 是否检查了注释质量，而不是只数注释数量。
- [ ] 是否保留了专业毒舌语气，但没有夸大或人身攻击。
- [ ] 是否没有提供修复代码。

related skills

semantically similar in the cross-vendor index

clawhub

76% match

ia-code-review

Structured code reviews with severity-ranked findings and deep multi-agent mode. Use when performing a code review, auditing code quality, or critiquing PRs,...

don't have the plugin yet? install it then click "run inline in claude" again.

reformatted original chinese skill into implexa's 6-component structure (intent, inputs, procedure with 6 substeps, decision points, output contract, outcome signal), added edge cases (empty diffs, broad scopes, missing context), clarified evidence collection and nine review channels, provided explicit ranking logic and output template, and added self-check validation before report generation.

Tvs Code Reviewer

intent

use this skill when a user requests code review, asks to find bugs, wants to spot issues, demands sharp critique, checks a current diff, reviews a PR, or audits specific files. the skill enforces a stable process: confirm scan scope first, run code through nine fixed review channels to find actual problems (not vibes), output findings sorted by severity with evidence and impact, write sharp but accurate problem titles, and never provide code fixes. the goal is to nail "what will break, why, and how bad" without guessing or rewriting.

inputs

scan scope (required): exactly one of: current git diff, specific file(s) or directory path, PR/commit range, code snippet pasted by user, or error/log output. no scope means no review; return the standard scope prompt (see procedure step 1).
code context (optional): related source files, call sites, type definitions, interfaces, data structures, configuration, test files, build/lint/failure logs, routing tables, permission rules, data flow diagrams.
git history (optional): diff before/after behavior, related commits, version changes, compatibility notes.
runtime evidence (optional): test output, production logs, network traces, state dumps that prove or disprove a hypothesis.

edge cases to handle:

empty diff (no changes to review).
scope too broad (user says "review everything"); ask for narrower bounds.
scope too vague ("review this file" with 5000+ lines); ask for specific range or function.
code snippet with no context (missing imports, types, or call sites); note as "insufficient context" and ask for more.
scope outside your knowledge (proprietary systems, internal tools); document and proceed with best-effort analysis.

procedure

step 1: confirm scan scope (input: user request; output: explicit scope or request for clarification)

ask the user to provide explicit scope using one of these templates:

请提供明确的扫描范围，例如具体目录、文件、代码片段、当前 diff、PR 或提交范围。没有范围就让我扫全仓库，这种审查基本等于闭眼开炮。

or in english:

provide explicit scan scope: a specific directory, file, code snippet, current diff, PR, or commit range. without scope, reviewing the whole repo is like auditing blind. give me bounds.

do not proceed until you have one of these:

a file path or directory (src/auth/login.ts or src/components/)
a diff output (git diff, PR diff)
a code snippet (pasted by user)
a commit/PR range (HEAD~5..HEAD or PR #123)
an error log or test output tied to specific code

output: either "scope locked: [scope]" or "scope needed: [reason]".

step 2: collect evidence (input: scope + context; output: annotated code map)

before writing any findings, gather these materials in this order:

source code and call chain: the actual lines in scope, plus one level up (caller/callee).
types and interfaces: any TypeScript types, runtime contracts, data shapes that govern behavior.
configuration and environment: env vars, feature flags, build config, runtime settings affecting behavior.
tests and logs: unit tests, integration tests, e2e tests, build logs, failure output, runtime logs.
similar code patterns: how the same feature is implemented elsewhere in the codebase (to spot divergence).
data flow and boundaries: input validation, serialization, state transitions, persistence points.

output: a mental map or a brief list like:

evidence found:
- source: path/to/file.ts lines 42-67
- types: interface User { id: string; role: 'admin' | 'user' }
- tests: __tests__/file.test.ts covers happy path only
- logs: none provided
- pattern: similar check exists in auth/middleware.ts
- data flow: req.body -> validateUser() -> db.insert() -> res.json()

if evidence is insufficient to confirm a finding, write it as "待确认" (awaiting confirmation) or "低置信度" (low confidence) and note what's missing.

step 3: run nine review channels (input: annotated code map; output: list of potential issues per channel)

check each channel in order. not every channel yields a finding; check all nine regardless.

channel 1: correctness / behavior regression

do conditional branches handle all cases (if/else, switch, ternary)?
are null, undefined, empty, boundary, exception states handled?
does new code break old callers (return type change, param order, behavior flip)?
does the change violate an implicit contract (e.g., "this function never throws" but now it does)?

channel 2: data and state consistency

do request params, response fields, database schema, store state align?
are cache, persistence, pagination, sorting, filtering correct (dirty reads, stale data)?
do multi-step flows complete fully or risk partial success (e.g., create user, send email, log event: what if email fails)?

channel 3: permission / security / privacy

are permission checks present and correct (not just trust user.role)?
are sensitive fields (token, password, ssn, user email) leaked to frontend, logs, or error messages?
are there injection risks, open redirects, path traversal, weak validation, or xss windows?

channel 4: error handling / concurrency / async

are promises chained or awaited correctly?
are request cancellations, repeated clicks, race conditions handled?
do failures gracefully degrade or do they hang loading, swallow errors, or lie in success messages?
do retries, timeouts, idempotence, and duplicate-submit prevention exist?

channel 5: interface contract / compatibility

are API endpoints, RPC calls, component props, hook returns, event names stable or broken?
are types more optimistic than runtime (e.g., string | null in type but always non-null at runtime)?
are breaking changes to public interfaces documented?

channel 6: test and verification gaps

are critical paths covered by tests (unit, integration, e2e, manual)?
does the change require new tests or manual verification?
if tests don't exist, can the change be verified in another way (logs, monitoring, user feedback)?

channel 7: architecture / maintainability

does the change bypass existing layers (request, business, state, permission)?
are business rules leaking into ui code or ui details into business code?
is there duplication, hidden coupling, or hard-to-trace side effects?

channel 8: comments / readability / developer understanding

do state variables, core functions, business functions lack jsDoc?
do complex functions need "why" comments or structured jsdoc (not just "what")?
are comments only restating code (bad) or explaining intent and boundaries (good)?
are simple getters, setters, single-line wrappers over-commented (yes, that's bad)?
are critical side effects, compatibility quirks, or boundary handling undocumented (risky)?

channel 9: project coding conventions

typescript naming: camelCase for variables/functions, PascalCase for types/classes?
const / let used, no var?
async logic uses async/await, not chained .then?
no stray console.log, debug, or commented-out code?
ui elements with meaningful purpose have data-testid or data-alt (not cosmetic wrappers)?
styles use project standard (tailwind/unocss/scss), not new inline css or raw css files?

output per channel: either "no findings" or a list of suspected issues with location and reasoning.

step 4: rank by severity and confidence (input: list of suspected issues; output: sorted, filtered list)

for each suspected issue, assign:

severity (based on "impact scope × trigger probability × recovery cost"):

CRITICAL: permission bypass, data corruption, financial/privacy/safety loss, production downtime.
HIGH: core user-facing bug, major behavior regression, missing critical check, code that will definitely error.
MEDIUM: edge cases, error handling, concurrency, compatibility, performance, test gaps with realistic triggers.
LOW: local maintainability, naming, structure, comments, duplication that won't block but adds maintenance debt.

confidence:

高 (high): code directly proves the issue, or logs/tests show it failing.
中 (medium): code path strongly implies the issue, but no runtime proof yet.
低 (low): signal of risk, needs user confirmation of context.

do not upgrade severity to look strict. do not downgrade to be polite. match reality.

filter out:

issues with "low" confidence and "low" severity (noise).
style preferences masquerading as bugs.
issues outside the stated scope (move to "out-of-scope observations" instead).

output: list of issues sorted by (severity, confidence), highest first. example:

CRITICAL, high: [issue A]
HIGH, high: [issue B, issue C]
HIGH, medium: [issue D]
MEDIUM, high: [issue E]
MEDIUM, low: [issue F]
LOW, high: [issue G]
(filtered: 2 low/low issues)

step 5: write output (input: ranked issues, scope, gaps; output: formatted markdown report)

structure:

## 审查结论 (Review Conclusion)

扫描范围：[scope]
发现 [n] 个可确认问题：[critical] 个 CRITICAL，[high] 个 HIGH，[medium] 个 MEDIUM，[low] 个 LOW。
如果没有发现问题，写：未发现可确认问题。

## 问题 (Findings)

### [SEVERITY] [Sharp, accurate title that diagnoses the problem]

- **位置 (Location)**: `path/to/file.ts` line X, function `nameHere()`
- **置信度 (Confidence)**: 高 / 中 / 低
- **证据 (Evidence)**: quote specific code, diff, log, or call chain that proves the issue.
- **问题 (Problem)**: explain why it's wrong. avoid vague statements like "this could be better"; be specific.
- **影响 (Impact)**: what breaks: user flow, data integrity, performance, security, maintainability, etc.?
- **触发条件 (Trigger)**: what input, state, or action makes this happen?

(repeat for each finding)

## 验证缺口 (Verification Gaps)

- only list major paths that lack tests or runtime evidence; collect here, not per-issue.

## 范围外观察 (Out-of-Scope Observations)

- only include findings strongly related to this review but outside stated scope; omit if none.

step 6: self-check before output (input: draft report; output: validated report or revisions)

before returning, verify:

only reviewed stated scope; scope-creep items moved to "out-of-scope"?
every finding has location and evidence (no vague "this is bad" entries)?
severity tied to real impact, not opinion?
clear distinction: definite bug vs. risk vs. awaiting confirmation?
trigger conditions explained (what input causes this)?
no code fixes provided?
tone: sharp but professional, never ad-hominem or exaggerated?
verification gaps (missing tests, no logs) collected in one section?
no filler ("overall looks good", "nice work")?

if any box is unchecked, revise the output. if all pass, return the report.

decision points

if user provides no scope: return the standard scope-needed prompt. do not guess or start reviewing.

if scope is a diff: prioritize before/after behavior change, not just new code surface. compare old and new paths side-by-side.

if scope is too broad (e.g., "review src/") and the directory has 100+ files: ask the user to narrow it: "pick specific file(s), function name, or commit range. auditing thousands of lines blind is pointless."

if code snippet lacks imports or types: note "insufficient context: missing [imports/types/call site]" and ask for more. continue with best-effort if context is too fragmented.

if evidence is incomplete (no tests, no logs, unclear call chain): mark findings as "中 (medium)" or "低 (low)" confidence and list missing evidence in "verification gaps". do not pretend certainty.

if a finding is outside scope but strongly related (e.g., user reviews function A, but A calls unsafe function B): mention in "out-of-scope observations" only; do not expand it or count it in the main findings.

if a finding is "low severity, low confidence": filter it out. it's noise. if it's worth mentioning for context, move to "verification gaps" or "out-of-scope observations".

if user's code has no apparent bugs: output "未发现可确认问题。" (no confirmed issues found.) then list "verification gaps" (test coverage, missing logs, manual validation needed) so the review is not empty.

output contract

the final report is a markdown document with:

header section: scope, summary count (CRITICAL, HIGH, MEDIUM, LOW).
findings section: one subsection per issue, each with location, confidence, evidence, problem, impact, trigger.
verification gaps section: list of missing tests, logs, or runtime proof for critical paths; omit if none.
out-of-scope observations section: risks outside stated scope; omit if none.
no code fixes: never include corrected code, patches, or refactoring suggestions.
no filler: no "overall looks fine", no marketing-speak, no corporate hedging.

example output skeleton:

## 审查结论

扫描范围：src/auth/login.ts 中的 `validateLogin()` 函数
发现 2 个可确认问题：0 CRITICAL, 1 HIGH, 1 MEDIUM, 0 LOW.

## 问题

### [HIGH] 密码对比没有时间恒定，容易被侧信道攻击

- 位置：`src/auth/login.ts` line 34, function `validateLogin()`
- 置信度：高
- 证据：代码用 `password === user.hash` 直接对比，vs. 项目其他地方用 `timingSafeEqual()`.
- 问题：字符串相等判断会在第一个不匹配字符处立即返回，响应时间泄露密码长度和内容。
- 影响：攻击者可通过计时侧信道逐位破解密码。
- 触发条件：任何 login 请求，用户多次尝试错密码。

### [MEDIUM] 失败登录没有速率限制

- 位置：`src/auth/login.ts` line 12, function `handleLoginRequest()`
- 置信度：中
- 证据：代码对失败登录无计数或延迟；同地址无限尝试。
- 问题：暴力破解没有防护。
- 影响：账户被爆破风险。
- 触发条件：攻击者对同用户连续发送 login 请求。

## 验证缺口

- 没有速率限制的集成测试；需要确认 Redis/Memcached 连接是否存在。
- 没有侧信道攻击的安全审计或文档。

## 范围外观察

- 相邻的 `resetPassword()` 也使用直接字符串对比；建议一并修复。

outcome signal

the user knows the skill worked when:

scope is confirmed: the report starts with "扫描范围：[exact scope]" or "scope: [exact scope]".
findings are specific: each issue has file, line, function, and quoted code, not vague statements.
evidence is cited: every finding references actual code, diff, log, or test output.
severity matches impact: CRITICAL issues really do cause downtime or data loss; not inflated.
no code is provided: the report says "问题" (problem), not "解决方案" (solution).
tone is sharp: problem titles cut straight to the issue, not softened or padded ("X会导致Y" not "X might possibly sometimes cause Y under rare conditions").
gaps are listed: if tests are missing or logs don't exist, the "验证缺口" section explicitly names them.
report is complete: no placeholder sections, no "todo" items, no "let me know if you need more".