Shed

Item: Shed
Rating: 8.3
Author: Implexa

Context window hygiene for long-running LLM agents. Decision rules for when and how to compress, mask, switch, or delegate context — backed by research (JetB...

view source

installs

stars

karma

SkillRank score ↗

8.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-31

shed provides decision rules and procedures for managing context window growth in long-running agents, prioritizing simple masking of tool outputs over costly summarization and using architectural patterns like typed memory blocks and positional placement.

structure

9.0

trigger phrases

9.0

procedure

8.0

edge cases

7.0

documentation

9.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: shed
description: Context window hygiene for long-running LLM agents. Decision rules for when and how to compress, mask, switch, or delegate context — backed by research (JetBrains/NeurIPS 2025, OpenHands, Letta/MemGPT, LLMLingua). Use when an agent runs for extended sessions, accumulates large tool outputs, approaches context limits, or suffers from compaction/overflow. Also use when designing agent architectures that need to manage context over time.
---

# Shed — Context Hygiene for Agents

*Shed what you don't need. Keep what matters.*

Named for molting — the process of shedding an outer layer to grow. Your context window is your skin. When it gets too heavy, shed the dead weight.

## Core Principle

**Tool outputs are 84% of your context growth but the lowest-value tokens you carry.** (Lindenbauer et al., NeurIPS 2025 DL4C workshop, measured on SWE-agent). Everything flows from this.

## The Rules

### After Every Tool Call

1. **Extract, don't accumulate.** When a tool returns large output (file contents, search results, logs, API responses), immediately write the key facts to a file or compress into bullets. The raw output is now disposable.
2. **Ask: "Will I need this verbatim later?"** Almost never. The answer you extracted is what matters, not the 500 lines that contained it.

### When Context Reaches ~70%

3. **Trigger condensation.** Don't wait for the platform to compact you — that's losing control of your own memory. At 70%, actively shed.
4. **Mask old tool outputs first** (free, no LLM calls). Keep your reasoning and action history intact — you need your decision chain, not the raw `ls -la` from 20 turns ago.
5. **Summarize reasoning only as backup.** If masking isn't enough, compress old reasoning turns. But this is lossy and costs an LLM call — use sparingly.
6. **Never re-summarize a summary.** If you've already condensed once and context is growing again, switch context or spawn a sub-agent. Recursive summarization compounds errors.

### When Completing a Task

7. **Write results to file, then switch context immediately.** Stale completed-task context is anti-signal for your next task. Don't carry it.
8. **Leave breadcrumbs.** Before switching: write what you did, what's next, and where the files are to `memory/YYYY-MM-DD.md`. Future-you needs a trailhead, not a transcript.

### When Delegating Work

9. **Spawn fresh-context sub-agents for complex sub-tasks.** Your context is noise for their work. Give them a clean prompt with just what they need.
10. **Don't inherit parent context into children.** The AutoGen pattern: each agent gets its own token budget. Inherited bloat = inherited degradation.

### Architecture (For Agent Builders)

11. **Structure context into typed blocks with hard size limits.** Every production framework converges here — Letta uses labeled blocks (human, persona, knowledge) with character caps. A monolithic context is unmanageable.
12. **Separate working memory (in-context) from reference memory (file/DB).** Your effective context is much smaller than your window size. Models lose information in the middle of long contexts.
13. **Place critical information at the beginning or end of context, never the middle.** Positional attention bias underweights middle content by up to 15 percentage points (Hsieh et al., 2024, "Found in the Middle").

## The Complexity Trap

Don't assume sophisticated compression (LLM summarization) beats simple approaches (observation masking). The JetBrains "Complexity Trap" paper (2025) tested both across 5 model configurations on SWE-bench Verified:

- Simple masking **halved cost** relative to raw agent
- Masking **matched or exceeded** LLM summarization solve rates
- Example: Qwen3-Coder went from 53.8% → 54.8% with masking alone

The lesson: start simple. Mask tool outputs. Only add summarization if masking alone isn't enough.

## Cost Model

Without intervention, cost per turn scales **quadratically** (each turn adds tokens AND reprocesses all previous tokens). Periodic condensation converts this to **linear** scaling. OpenHands measured 2x cost reduction with their condenser.

## Quick Reference

| Situation | Action |
|-----------|--------|
| Tool returned big output | Extract facts → file → discard raw |
| Context at ~70% | Mask old tool outputs |
| Context still growing after masking | Summarize oldest reasoning turns |
| Task complete | Write results → switch context |
| Complex sub-task needed | Spawn fresh sub-agent |
| Already condensed, still growing | Switch context or spawn |
| Critical info to preserve | Put at start or end, not middle |

## Sources

- Lindenbauer et al., "The Complexity Trap" (NeurIPS 2025 DL4C): https://arxiv.org/abs/2508.21433
- OpenHands Context Condensation (2025): https://openhands.dev/blog/openhands-context-condensensation-for-more-efficient-ai-agents
- Letta/MemGPT Memory Blocks: https://www.letta.com/blog/memory-blocks
- LLMLingua-2 (ACL 2024): https://aclanthology.org/2024.acl-long.91/
- Liu et al., "Lost in the Middle" (2023): https://arxiv.org/abs/2307.03172
- Hsieh et al., "Found in the Middle" (2024): https://arxiv.org/abs/2406.16008
- MEM1 Dynamic State Management (2025): https://arxiv.org/abs/2506.15841

related skills

semantically similar in the cross-vendor index

clawhub

77% match

Context Management

Manage AI agent context window consumption, prevent compaction death spirals, and enforce sub-agent spawn policies. Use when: (1) context is filling up and w...

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit inputs section with env var guidance and auth scopes, broke monolithic rules into numbered procedure steps with clear input/output contracts, extracted implied decision logic (no-summarize-summaries, fallbacks for ephemeral environments, re-run detection) into decision points section, formalized output contract with file formats and positioning rules, added outcome signals with measurable success criteria (context % thresholds, file presence, task turnarounds).

Shed , Context Hygiene for Agents

shed what you don't need. keep what matters.

named for molting, the process of shedding an outer layer to grow. your context window is your skin. when it gets too heavy, shed the dead weight.

intent

shed manages context window bloat in long-running llm agents by providing explicit decision rules for compressing, masking, switching, or delegating context. use this skill when an agent runs extended sessions, accumulates large tool outputs, approaches token limits, or suffers from performance degradation due to context overhead. also use when designing multi-agent architectures that need structured memory management over time. the core insight: tool outputs are 84% of context growth but the lowest-value tokens you carry (Lindenbauer et al., NeurIPS 2025). intervention converts quadratic cost scaling to linear.

inputs

agent state and monitoring

current context window usage percentage (from llm platform or token counter)
cumulative tool call history (call names, inputs, outputs)
reasoning/action transcript (agent's decision chain)
task completion status (in progress, completed, or stalled)

external connections

file system or persistent memory store (for writing summaries, breadcrumbs, results). env var: MEMORY_PATH (default: ./memory). no auth required.
llm api with token counting (if doing summarization). env var: LLM_API_KEY. required only if you choose summarization path (step 5 below). rate limit aware: most providers throttle at 100-200k tokens/min. budget accordingly.
optional: multi-agent orchestration framework (autogen, crew, letta, openhands). no special auth. context isolation is the constraint, not connectivity.

knowledge/research baseline

familiarity with "found in the middle" (Hsieh et al., 2024) positional attention bias
understanding of your llm's context window size and effective working window (usually 30-60% of stated window before quality degrades)
awareness of your task's criticality (is re-spawning context acceptable if a sub-task fails?)

procedure

after every tool call

extract and write. when a tool returns output larger than 2kb (file contents, search results, logs, api responses), immediately extract the key facts, decisions, or findings into a bullet list. write this list to MEMORY_PATH/extracts/<task-id>-<turn>.md. discard the raw tool output from context. input: raw tool output. output: <task-id>-<turn>.md with 3-5 bullet points, stored on disk.
audit retention. ask: "will i need this exact output verbatim later in this task?" almost always the answer is no. the answer you extracted is what matters, not the 500 lines that contained it. if you cannot justify keeping it, mark it for masking in step 3. input: extracted facts from step 1, task objective. output: a binary decision per tool call: "mask this output" or "keep this output".

when context reaches 70% of window

mask old tool outputs (no llm cost). scan your context history backwards from oldest to most recent. for each tool call marked "mask this output" in step 2, replace the raw output with a placeholder like [Tool: file_search. Output masked. See extract at memory/extracts/task-123-turn-5.md]. preserve the tool call name and input so you can re-run it if needed. do not remove reasoning turns or action descriptions. input: context transcript, masking decisions from step 2. output: context window reduced by 20-40%, no new tokens consumed. record which extracts were masked in MEMORY_PATH/masking-log.md.
check if masking is enough. recalculate context usage. if it's now below 60%, stop. you're done. if it's still above 65%, continue to step 5. input: current context window percentage after masking. output: go/no-go decision for summarization.
summarize old reasoning turns (costs llm calls, use sparingly). if masking alone isn't enough, compress the oldest reasoning turns (3-5 turns old, never current) into single-paragraph summaries. input: reasoning transcript from 10+ turns ago. output: one paragraph per 3-5 turns, ~100-150 tokens. replace the old turns in context with the summary. log the summarization in MEMORY_PATH/summarization-log.md with timestamp and turn range. cost: ~5-10 extra api calls per condensation cycle. note: do not re-summarize an already-summarized passage (see decision point 6 below).

when completing a task

write results and metadata. once your task reaches completion or checkpoint, write three things to MEMORY_PATH/YYYY-MM-DD.md: (a) what you did (2-3 sentences), (b) the key outputs or files created, (c) what's next or where future context should pick up. input: task state, outputs, completion status. output: markdown file with date stamp, 200-300 words, stored in MEMORY_PATH.
switch context immediately. do not carry completed-task context into your next task. request a context reset from your orchestrator. cold start is better than stale signal. input: completed task metadata from step 6. output: fresh context window for next task, with only the breadcrumb file as reference.

when delegating work

spawn fresh-context sub-agents. for complex sub-tasks (e.g., "refactor this module" or "debug this error chain"), create a new agent instance with a minimal prompt containing only: task objective, relevant file names/paths, success criteria, and a link to the parent breadcrumb. do not inherit parent context. input: sub-task specification, parent task breadcrumb. output: new agent with clean context window and isolated token budget.
do not inherit bloat. the autogen pattern applies: each agent gets its own token budget and context lifecycle. if the parent agent is at 75% context, the child agent starts at 0%. shared memory is a file store, not shared context. input: parent context state, child task spec. output: child agent instance with no inherited context bloat.

for agent architects

structure context into typed blocks with hard size caps. design your context as labeled sections: (a) system prompt, (b) task state, (c) working memory (current tool outputs and reasoning), (d) reference memory (links to external files). assign character caps per block: system = 1000, task state = 500, working memory = variable (monitor this), reference = 200. letta and openhands both converge on this model. input: agent architecture design. output: context schema with explicit block sizes and governance rules.
separate in-context memory from file/db memory. your true effective context is much smaller than your window size. models lose information in the middle. put only active working state in context; push summaries, logs, and intermediate results to disk. input: all task outputs and reasoning. output: (a) context window with 2-3 most recent turns, (b) MEMORY_PATH with complete history.
position critical info at the start or end, never the middle. the "found in the middle" effect (Hsieh et al., 2024) underweights middle content by up to 15 percentage points. place your task objective and success criteria in the first 100 tokens and your most recent reasoning at the end. bury old extracts and masked outputs in the middle. input: context structure. output: reordered context with critical info at boundaries.

decision points

if tool output is small (under 2kb): keep it in context. step 1 applies only to outputs >= 2kb. brief outputs add clarity and cost almost nothing.

if you've already masked once and context is still growing at 65%+: do not re-summarize summaries. recursive summarization compounds errors. instead, spawn a sub-agent or switch context entirely (step 7). input: masking history, current context %. output: decision to either delegate (step 8) or reset (step 7).

if you have no persistent file system (e.g., ephemeral serverless environment): fall back to in-context masking only. steps 1, 3, and 6 all assume a writable MEMORY_PATH. if unavailable, use placeholder comments in context (e.g., [masked: file_search result]) and accept higher context overhead. this degrades cost efficiency but is workable for short sessions (<20 turns). input: environment capabilities. output: modified procedure using only in-context masking.

if task is a one-off and won't spawn sub-agents: skip step 8-9. delegation is overhead if you're not building multi-agent workflows. masking and summarization alone cover 95% of single-agent use cases. input: agent architecture (single vs. multi-agent). output: skip or include delegation steps.

if your llm provider doesn't support token counting: estimate context % by character count. assume 1 token ~= 3-4 characters (varies by model and tokenizer). input: context text length. output: estimated % of window (divide character count by ~3.5 and divide by your known token window). this is rough but sufficient for triggering the 70% threshold.

if an extract becomes critical for re-running a tool: keep the reference in context. step 2's "audit retention" is a heuristic. if you realize mid-task that you need to re-run a tool with its prior output (e.g., debugging a failed api call), revert the masking decision for that extract. input: task state, tool failure analysis. output: unmasking the specific extract, keeping it in active context.

output contract

success means:

context usage stays below 75% of window throughout a task session, measured at step 4 after each masking cycle.
MEMORY_PATH/extracts/ contains one .md file per masked tool output, with 3-5 bullet points and a timestamp.
MEMORY_PATH/masking-log.md has chronological entries: [timestamp] masked <N> outputs, new context %: <X>. updated after each masking run.
MEMORY_PATH/summarization-log.md (if used) has entries: [timestamp] summarized turns <X-Y>, saved <N> tokens, new context %: <Z>. only present if summarization was triggered.
MEMORY_PATH/YYYY-MM-DD.md exists at task completion with (a) 2-3 sentence summary, (b) list of output files, (c) next-step breadcrumb.
if using sub-agents, each sub-agent's context starts at 0% and has its own MEMORY_PATH or isolated store.
all critical task objectives and recent reasoning are positioned in the first 100 tokens (system/task block) or last 100 tokens (recent turns) of context, never the middle 50%.

outcome signal

you know the skill worked when:

your agent completes tasks with stable context usage (no runaway token growth past 75%).
step 3 masking reduces context by 20-40% without loss of task continuity (agent can still reason about masked results via the extract files).
if summarization is triggered, it costs fewer llm api calls than a full context reset would (estimate: ~10 summarization tokens per turn saved vs. re-running a full task on fresh context).
step 6 breadcrumbs allow a fresh agent to pick up a follow-up task in <5 turns (no repeated exploration or re-learning).
in multi-agent setups, sub-agents (step 8) succeed without inheriting parent bloat and complete sub-tasks in fewer turns than if they'd inherited full parent context.
your agent's solve rate on long tasks (>50 turns) matches or exceeds its solve rate on short tasks (<20 turns), indicating context hygiene is preventing degradation.

credits: original design and research curation by compass-soul. methodology grounded in Lindenbauer et al. (NeurIPS 2025 DL4C), OpenHands, Letta/MemGPT, and LLMLingua.