Autoresearch Loop

Item: Autoresearch Loop
Rating: 8.3
Author: Implexa

Run an explicit, bounded modify-verify-decide loop toward a measurable metric with approval gates, scoped edits, and rollback proof.

view source

installs

stars

karma

SkillRank score ↗

8.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-06

autoresearch-loop enforces a bounded, approval-gated iteration cycle for measurable optimization, with atomic changes, verification checkpoints, guard rails against regression, and explicit escalation rules when progress stalls.

structure

10.0

trigger phrases

8.0

procedure

9.0

edge cases

8.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: autoresearch-loop
description: "Run an explicit, bounded modify-verify-decide loop toward a measurable metric with approval gates, scoped edits, and rollback proof."
metadata:
version: "0.2.1"
---
# Autoresearch Loop

Use this skill only when the user explicitly asks for an autoresearch or iterative improvement loop. It is for bounded, measurable optimisation in a version-controlled workspace. It must not start from casual improvement language, and it must not run unattended until the user has approved the goal, commands, scope, rollback strategy, run mode, and iteration cap.

The goal is measurable. Each iteration makes one atomic change, verifies it, and keeps or discards the result. The loop stops when the goal is met, the approved iteration cap is reached, the user stops it, or a blocker/safety gate is hit.

## Core Loop

```text
1. Confirm approved run contract
2. Read context + lessons file
3. Pick ONE hypothesis
4. Make ONE atomic change inside approved scope
5. Snapshot/commit before verification
6. Run approved VERIFY command
7. Run approved GUARD command
8. Decision: keep / discard / rework
9. Log the result
10. Health check and safety gate
11. Repeat only within approved cap
```

Read `references/loop-protocol.md` for the full loop spec.
Read `references/pivot-protocol.md` for the escalation ladder.
Read `references/lessons-protocol.md` for cross-run learning.

## Before Starting

Confirm with the user and do not start until the contract is explicit:
- **Goal** — one sentence describing what you want to achieve
- **Metric** — what number is measured, direction, baseline, and target
- **Verify command** — exact command used to measure the metric
- **Guard command** — exact command that must keep passing
- **Scope** — files/directories allowed to change, and files/directories that are forbidden
- **Rollback strategy** — normal branch/worktree revert, or isolated disposable reset
- **Run mode** — foreground by default; background/unattended only after explicit approval
- **Iteration cap** — required for background/unattended runs; recommended for foreground runs
- **External research policy** — web/search is off by default unless explicitly approved
- **Data boundary** — do not expose private code, secrets, logs, or proprietary data to external sources

Show the run contract and ask for confirmation. One round minimum. Then start only after the user says go.

## Verify vs Guard

- **Verify** = "Did the target metric improve?" — measures progress
- **Guard** = "Did anything else break?" — prevents regressions
- Guard files are never modified
- If verify passes but guard fails: rework up to 2 attempts, then discard

## Decision Rules

| Result | Action |
|--------|--------|
| Verify pass + Guard pass | Keep. Extract lesson. |
| Verify pass + Guard fail | Rework within approved scope (max 2 attempts). If still failing, discard. |
| Verify fail | Discard using approved rollback. |
| Crash | Stop unless the fix is clearly inside approved scope and non-destructive. |
| Syntax error | Fix immediately only if caused by the current iteration and inside approved scope. |

## Escalation Ladder

See `references/pivot-protocol.md` for full details.

| Trigger | Action |
|---------|--------|
| 3 consecutive discards | REFINE — adjust within current strategy |
| 5 consecutive discards | PIVOT — abandon strategy, try fundamentally different approach |
| 2 PIVOTs without improvement | Ask before external research unless pre-approved |
| 3 PIVOTs without improvement | Soft blocker — stop and report to human |

A single successful keep resets all counters.

## Long Run Hygiene

- Every completed experiment must be recorded before the next one starts
- Re-read original instructions every 10 iterations to prevent context drift
- Log: one row per iteration (iteration, commit/snapshot, metric, delta, status, description)
- For background runs, send progress at the approved cadence and stop at the cap

## Lessons

Extract structured lessons after:
- Every kept iteration (what worked and why)
- Every PIVOT decision (what failed and why)
- Run completion

Store in `autoresearch-lessons.md` in the working repo root unless the user chose another path. Do not commit this file unless the user explicitly asks. Consult it at the start of each run. Keep about 50 entries, summarising older ones with time decay.

## Safety

- Foreground mode is the default. Background/unattended mode requires explicit approval and an iteration cap.
- Make changes only inside approved scope.
- Commit, snapshot, or otherwise record only your own changes before verification.
- Revert only changes made by the current loop.
- Never reset unrelated user work.
- Never modify guard files unless the user explicitly changes the scope contract.
- Do not run destructive commands, deploy, publish, push, or touch production systems unless explicitly approved for this run.
- Do not use web search or external sources unless the run contract allows it.
- Do not paste private code, secrets, logs, customer data, or proprietary data into external services.
- Stop and report if the metric cannot be measured mechanically.

related skills

semantically similar in the cross-vendor index

clawhub

75% match

Loop

Run iterative agent loops until success criteria are met. Controlled autonomous iteration.

by @ivangdavila

don't have the plugin yet? install it then click "run inline in claude" again.

extracted implicit decision rules and escalation triggers into explicit decision points section, documented all required inputs with setup guidance and edge cases, restructured procedure as numbered steps with clear input/output pairs, added output contract specifying log format and report structure, and clarified outcome signals for both success and failure.

Autoresearch Loop

intent

run a bounded, measurable optimization loop that makes one atomic change per iteration, verifies the result against a metric, and keeps or discards based on explicit decision rules. use this skill only when the user asks for autoresearch or iterative improvement in a version-controlled workspace. the loop stops when the goal is met, the iteration cap is reached, the user stops it, or a safety gate triggers. every iteration must have human approval at the gate level before the run starts, and approval remains required for unattended background runs.

inputs

goal statement (string): one sentence describing the target outcome. required. example: "reduce p95 latency from 500ms to 200ms on the checkout endpoint".
metric name and direction (string): what you measure and which direction is better. required. example: "p95_latency_ms, lower is better".
metric baseline and target (numbers): current value and desired value. required. example: "baseline 500, target 200".
verify command (string): exact shell command that measures the metric. required. must be deterministic and idempotent. example: npm run test:perf -- --metric=p95_latency.
guard command (string): exact shell command that must keep passing. required. detects regressions in other systems. example: npm run test:integration && npm run lint.
scope allowed (list of paths): files and directories you may modify. required. example: src/checkout/, src/lib/cache.ts.
scope forbidden (list of paths): files and directories you must never touch. required. example: src/secrets/, .env, tests/fixtures/prod-data/.
rollback strategy (string): how to undo failed iterations. required. either "git branch/worktree revert" or "isolated disposable reset". example: "git stash + git checkout main after each failed iteration".
run mode (enum): "foreground" (default, interactive, stops for human input) or "background" (unattended, requires iteration cap and approval). required.
iteration cap (integer): max iterations before stopping. required for background mode, strongly recommended for foreground mode. example: 20.
external research policy (enum): "off" (default, no web/search), "on-approval" (allowed if pre-approved in this run), or "auto" (allowed without per-iteration approval). required.
data boundary policy (string): confirmation that no private code, secrets, logs, customer data, or proprietary info will be exposed to external sources. required if external research is enabled.
working repository (path): root directory of the version-controlled workspace. required.
lessons file location (path, optional): where to store cross-run learning. default: autoresearch-lessons.md in repo root.
progress report cadence (string, optional): for background runs only. examples: "every 5 iterations", "hourly". default: on completion only.

procedure

confirm run contract with user , display the complete run contract (goal, metric, verify command, guard command, scope, rollback strategy, run mode, iteration cap, external research policy, data boundary). ask for explicit approval. do not proceed without user confirmation. required: one round minimum of back-and-forth.
initialize workspace , enter the working repository. check that version control is clean (no uncommitted changes unrelated to this loop). if using git worktree mode for rollback, create a worktree now. log the starting commit hash or snapshot id.
read lessons and context , load and parse the lessons file (default: autoresearch-lessons.md). extract the last 5-10 relevant entries and summarize what has worked and failed in prior iterations. read any user-provided context files. use this to inform the first hypothesis.
log run start , write a run header to the lessons file with timestamp, goal, metric baseline, run mode, iteration cap, and user id or label.
start iteration loop (repeat until cap, goal met, or stop signal):

5a. read current state , re-read the original run contract and goal statement to prevent context drift. (do this every 10 iterations minimum). check that the repo is clean and at the expected commit/snapshot.

5b. form one hypothesis , based on lessons, prior results, and the metric, pick one atomic change to test. write the hypothesis in plain language: "if i [change X], then [metric] will [improve] because [reason]". log it.

5c. make one atomic change , modify only files within approved scope. a single change means one logical unit: one function, one config parameter, one data structure, one algorithm swap. not multiple changes in sequence. commit this change with a message that references the iteration number and hypothesis.

5d. snapshot before verify , create a commit or snapshot (git commit, git stash, or snapshot id from your rollback tool). log the commit hash or snapshot id.

5e. run verify command , execute the approved verify command. capture stdout, stderr, and exit code. parse the metric value. compute delta from baseline. log result.

5f. run guard command , execute the approved guard command. capture stdout, stderr, and exit code. record pass/fail.

5g. make decision , apply decision rules (see decision points section below). decision is: keep, discard, or rework.

5h. log iteration result , write one row to the loop log (iteration number, commit/snapshot id, metric value, delta, status, description, timestamp).

5i. health check and safety gate , check for crash, infinite loop, or resource exhaustion. if detected, stop and report to user unless the fix is clearly in approved scope and non-destructive. check escalation ladder (see decision points). if soft blocker triggered, ask user before continuing.

5j. extract lesson if kept , if decision is keep, extract a structured lesson: what worked, why, any side effects observed. append to lessons file.

5k. prepare next iteration , if continuing, move to iteration 5a.
run completion , exit the loop. log final metric, final status (goal met / cap reached / user stop / blocker), total iterations, total keeps, total discards, total pivots. extract final summary lesson if applicable. report to user with progress summary.

decision points

if verify passes AND guard passes , keep the change. extract lesson. reset all escalation counters (consecutive discard count, pivot count). commit or lock the snapshot. proceed to next iteration.

if verify passes BUT guard fails , rework is allowed. revert to the snapshot before verify. attempt up to 2 reworks of the same hypothesis (modify the change but keep the same core idea). if both reworks also fail guard, discard the entire hypothesis. increment consecutive discard counter. revert to the last kept state using the approved rollback strategy. proceed to next iteration.

if verify fails (metric did not improve or got worse) , discard immediately. revert to the last kept state using the approved rollback strategy. increment consecutive discard counter. log the result as discarded. proceed to next iteration.

if the repo crashes or becomes unrunnable , stop the loop unless the fix is clearly inside approved scope, non-destructive, and you are certain it is caused by the current iteration. if unsure, ask the user. never reset unrelated user work.

if a syntax error occurs , fix it immediately only if it was introduced in the current iteration, is inside approved scope, and is the only thing preventing verify/guard from running. otherwise, discard the iteration and revert.

if 3 consecutive discards occur , refine. adjust the strategy within the current hypothesis class. examples: try smaller steps, adjust parameters, try a different subset of the scope. reset the consecutive discard counter if you get one keep.

if 5 consecutive discards occur , pivot required. abandon the current strategy and try a fundamentally different approach to the goal. examples: if you were optimizing one config, try a different config; if you were rewriting one function, try algorithmic change instead. increment pivot counter. reset consecutive discard counter.

if 2 pivots have occurred without improvement , gate external research. if external research policy is "on-approval" or "off", ask the user for permission to search or consult external sources. if policy is "auto", proceed. if user denies or no policy allows it, proceed with internal reasoning only.

if 3 pivots have occurred without improvement , soft blocker. stop the loop and report findings to the user. do not continue unattended. ask for human guidance on next steps.

if the approved iteration cap is reached , stop the loop. log as "cap reached". report findings.

if external research is enabled , before using web search, web scraping, or external APIs, confirm that no private code, secrets, logs, or customer data will be exposed. do not paste proprietary info into external services.

if the metric cannot be measured mechanically , stop the loop. report to the user that the verify command must be deterministic and measurable. do not attempt manual measurement or eye-balling.

if run mode is background and no iteration cap was set , reject the run. do not start. require the user to set an explicit cap before proceeding in unattended mode.

output contract

loop log format , plain text or markdown table, one row per iteration. columns: iteration number, commit/snapshot id (short form ok), metric value, delta from baseline, status (keep/discard/rework/blocker), description (one-line hypothesis and reason for decision), timestamp. location: autoresearch-loop.log in repo root unless user specifies otherwise. this file should be machine-parseable (csv or markdown table).

lessons file format , plain text or markdown. each entry is a short summary: what was tested, result, why it worked or failed, any caveats. entries are sorted reverse-chronological (newest first) or by relevance. keep approx 50 entries (older entries summarized with time decay). location: autoresearch-lessons.md in repo root unless user specifies otherwise. do not commit this file unless user explicitly asks.

run summary report , text report printed to stdout at completion. includes: start time, end time, goal statement, baseline metric, final metric, delta, total iterations, total keeps, total discards, total pivots, final status (goal met / cap reached / user stop / blocker), key lessons learned, and any blocking issues. example:

AUTORESEARCH RUN COMPLETE
Goal: reduce p95 latency to 200ms
Baseline: 500ms, Target: 200ms, Final: 275ms
Iterations: 18/20, Keeps: 8, Discards: 10, Pivots: 1
Status: cap reached
Key Lesson: connection pooling had larger effect than query optimization
Blocking Issues: none

commit/snapshot trail , each iteration creates an immutable commit or snapshot. the rollback strategy determines whether these are git commits (reversible via git) or snapshot ids (reversible via snapshot restore). all snapshots/commits must be logged in the loop log so the user can inspect or replay any iteration.

outcome signal

the skill worked if:

the run contract was confirmed by the user before the loop started.
the loop made progress toward the goal (metric moved in the right direction over multiple iterations) or clearly identified a blocker.
every iteration has a logged row in the loop log with metric value, commit/snapshot id, and decision.
the user can inspect any iteration by reverting to its snapshot/commit using the rollback strategy.
guard command kept passing (or regressed were fixed within 2 reworks), preventing silent breakage.
lessons were extracted and logged in the lessons file for future runs.
the loop stopped cleanly (goal met, cap reached, user stop, or blocker) with a summary report printed to the user.
no unrelated user work was reset or lost.
no private code, secrets, or proprietary data was exposed to external sources.
if external research was used, it was pre-approved in the run contract.

the skill did not work if:

the run started without explicit user approval of the contract.
modifications were made outside approved scope.
the metric could not be measured mechanically.
the loop ran unattended without an iteration cap.
an iteration was not logged in the loop log.
the user lost the ability to revert an iteration.
guard command failed and was not either fixed or discarded.
the loop ran past the escalation ladder without user input.