Lesson 03 · Planning

Let the agent track its own progress

“The agent can track its own progress — and I can see it.” Give the model a checklist, then use a small mechanism to make it remember to update that list.

⏱ ~10 min · 📝 3 interactive widgets · 🧑‍💻 Based on shareAI-lab · s03_todo_write.py

Structured self-planning

Claude Code routinely works through multi-step tasks: grep for references → read files → edit code → run tests. If you let the model “wing it”, the first few steps go fine, then it starts forgetting, and eventually it stalls mid-task.

s03’s solution is a todo tool: the model calls todo to add items to a list, and TodoManager validates structure, persists state, and returns the current view. Two benefits:

  • The model is forced to make its plan explicit — just writing it out already helps it think.
  • Humans can see what it’s thinking. Debugging gets ten times easier.
# TODO view — each item is structured
[ ] #1: grep "TODO" across src/
[>] #2: read src/app.py and list comments     # in progress
[ ] #3: generate summary markdown
[ ] #4: write to TODO_LIST.md

(0/4 completed)

One hard rule: only one in_progress at a time

TodoManager.update() enforces this constraint:

if in_progress_count > 1:
    raise ValueError("Only one task can be in_progress at a time")

Strict, but it’s helping the model. If three items can be “in progress” simultaneously, the model battles on all fronts and finishes none of them. Forcing single-task advancement means it must complete each item before moving to the next.

The widget below lets you act as the model: submit different todo payloads and see which pass validation and which are rejected.

Nag reminder: 3 turns without an update? A nudge.

Even with a todo tool, the model sometimes “forgets” to update the list — it does a bunch of work, but the in_progress item stays stuck at #2. s03’s fix is a simple counter:

rounds_since_todo = 0
while True:
    response = LLM(messages, tools)
    ...
    used_todo = any(b.name == "todo" for b in tool_uses)
    rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
    if rounds_since_todo >= 3:
        results.append({"type":"text", "text":"<reminder>Update your todos.</reminder>"})

Once the counter hits 3, a reminder is injected into the next user message. The model instinctively calls todo when it sees this. It’s an engineering technique that turns a soft guideline (“please keep your todos updated”) into a hard stimulus.

What is this pattern called?

In agent design circles, this is called structured self-planning with soft nudges — give the model a structured state it must write to, supplemented by timely reminders. Claude Code’s real implementation uses a similar pattern, but more conservatively (lower frequency, neutral wording).

Why not just put “update todo every step” in the system prompt? You can, but a model’s compliance with general system-prompt instructions degrades as the conversation grows longer. Splitting the instruction into a repeatedly re-injected reminder gives much more stable results.
Interactive

Widget 1 · Kanban · todo state evolving across turns

Click Step and watch the model move tasks from pending to in_progress to completed. Notice that in_progress is always exactly one item.

[ ] pending
[>] in_progress
[x] completed
Ready to start…
Interactive

Widget 2 · Validation · which of 5 todo payloads pass?

When the model calls the todo tool it passes an items array. TodoManager runs a series of checks: non-empty text, valid status, at most one in_progress, total count ≤ 20. Click Pass or Reject for each payload.

Correct: 0 / 5
Interactive

Widget 3 · Nag Counter · what happens after 3 turns with no todo update

Click Next Turn and watch the counter. Each turn randomly decides whether the model calls todo or not — just like real-world model behavior.

rounds_since_todo: 0