Let the agent track its own progress
“The agent can track its own progress — and I can see it.” Give the model a checklist, then use a small mechanism to make it remember to update that list.
Structured self-planning
Claude Code routinely works through multi-step tasks: grep for references → read files → edit code → run tests. If you let the model “wing it”, the first few steps go fine, then it starts forgetting, and eventually it stalls mid-task.
s03’s solution is a todo tool: the model calls todo to add items to a list, and TodoManager validates structure, persists state, and returns the current view. Two benefits:
- The model is forced to make its plan explicit — just writing it out already helps it think.
- Humans can see what it’s thinking. Debugging gets ten times easier.
# TODO view — each item is structured [ ] #1: grep "TODO" across src/ [>] #2: read src/app.py and list comments # in progress [ ] #3: generate summary markdown [ ] #4: write to TODO_LIST.md (0/4 completed)
One hard rule: only one in_progress at a time
TodoManager.update() enforces this constraint:
if in_progress_count > 1: raise ValueError("Only one task can be in_progress at a time")
Strict, but it’s helping the model. If three items can be “in progress” simultaneously, the model battles on all fronts and finishes none of them. Forcing single-task advancement means it must complete each item before moving to the next.
The widget below lets you act as the model: submit different todo payloads and see which pass validation and which are rejected.
Nag reminder: 3 turns without an update? A nudge.
Even with a todo tool, the model sometimes “forgets” to update the list — it does a bunch of work, but the in_progress item stays stuck at #2. s03’s fix is a simple counter:
rounds_since_todo = 0 while True: response = LLM(messages, tools) ... used_todo = any(b.name == "todo" for b in tool_uses) rounds_since_todo = 0 if used_todo else rounds_since_todo + 1 if rounds_since_todo >= 3: results.append({"type":"text", "text":"<reminder>Update your todos.</reminder>"})
Once the counter hits 3, a reminder is injected into the next user message. The model instinctively calls todo when it sees this. It’s an engineering technique that turns a soft guideline (“please keep your todos updated”) into a hard stimulus.
What is this pattern called?
In agent design circles, this is called structured self-planning with soft nudges — give the model a structured state it must write to, supplemented by timely reminders. Claude Code’s real implementation uses a similar pattern, but more conservatively (lower frequency, neutral wording).
Why not just put “update todo every step” in the system prompt? You can, but a model’s compliance with general system-prompt instructions degrades as the conversation grows longer. Splitting the instruction into a repeatedly re-injected reminder gives much more stable results.