Context Window Management for Multi-Step AI Agents
Every AI agent has a context window. When a workflow exceeds that window -- through accumulated conversation history, tool outputs, and intermediate results -- the agent starts losing information. The failure mode is not a crash. It is a silent degradation of output quality that gets worse with every step.
After running multi-step agent workflows with 20+ steps, here is how to manage context effectively.
The Problem: Context Accumulation
A typical 20-step agent workflow accumulates context at each step. By step 10, the agent's context includes the full output of steps 1-9. By step 15, most of that context is no longer relevant, but the agent is paying for it in every API call -- both in cost and in attention degradation.
The empirical observation: agent output quality drops measurably after step 5-7 when context is accumulated without management. Fidelity to the original task degrades by approximately 30% per delegation layer.
Strategy 1: Three-Tier Context Partitioning
Not all context is equal. Partition it into three tiers with different propagation rules:
- Tier 1 (Invariants): Propagated verbatim to every step. Never compressed. Target: under 200 tokens. Includes: the user's original request, global constraints, success criteria, anti-goals.
- Tier 2 (Orientation): Compressed per step. Domain knowledge, prior results, failure history. Compressed to ~70% at each layer.
- Tier 3 (Operational): Not propagated. Available via retrieval pointers. Raw data, full conversation history, source documents.
The partition rule: if losing it causes the agent to solve the WRONG problem, it is Tier 1. If losing it causes the agent to solve the right problem POORLY, it is Tier 2. If losing it causes the agent to re-derive something already known, it is Tier 3.
Strategy 2: Semantic Hashing
How do you know if context was lost? You cannot ask the agent "did you lose context?" because it does not know what it does not know.
Semantic hashing solves this: generate 3-5 questions with known-correct answers from the delegator's perspective. Ask the receiving agent to answer them. If the answers diverge, context was lost and you can repair it before the agent does any work.
Cost: ~100 tokens per handoff. Benefit: early detection of semantic drift before the agent burns its token budget solving the wrong problem.
Strategy 3: Uncertainty Manifests
Before producing output, every agent must declare:
- What assumptions am I making that were not explicitly stated?
- What information would change my approach if I had it?
- What am I uncertain about?
The delegating agent compares this manifest against its own knowledge. If the worker lists an assumption the delegator knows to be false, context loss is detected BEFORE the output is produced.
Strategy 4: Orchestrator Succession
When an orchestrator reaches 80% context usage, it should checkpoint its state and spawn a successor. The successor reads the state artifact and resumes from the last checkpoint. Zero work is lost, and the new orchestrator starts with a fresh context window.
This transforms the context window from a hard limit into a soft limit. The system can run indefinitely by spawning successor orchestrators.
Strategy 5: Context Pruning
After each step, summarize the result before passing it to the next step. Use a cheap model for summarization with specific instructions: "Extract only the data types and function signatures" not "Summarize the previous step."
The specificity of the summarization instruction determines quality. A vague instruction produces a vague summary that loses critical details.
When to Apply Each Strategy
| Workflow Length | Strategy |
|---|---|
| 1-5 steps | No management needed. Context fits. |
| 5-10 steps | Context pruning after step 3. Tier 1/2 partition. |
| 10-20 steps | Full three-tier partitioning + semantic hashing + pruning. |
| 20+ steps | All of the above + orchestrator succession. |
Context management is one of 15 patterns in the Protocol Playbook.