S05 · Skill Loading — Learn Claude Code

The "stuff everything in the system prompt" trap

You have 20 skills, each pretty detailed: pdf-processing (how to read PDFs), code-review (a review checklist), git-workflow (common git patterns)... The intuitive move: concatenate all of them into the system prompt so the model can look them up at any time.

The result:

Every call burns 15-30K input tokens, even when the task uses none of the skills.
Model attention is diluted - compliance with rules buried in a long system prompt degrades.
Change one skill and every cached conversation is invalidated.

s05's approach: split it into two layers.

The two-layer architecture

Layer 1 - cheap: The system prompt holds only the skill name and a one-sentence description (~100 tokens each). 20 skills = 2K tokens. Acceptable.

# Skill registry in the system prompt
Skills available:
  - pdf: Process PDF files. Extract text, tables, metadata.
  - code-review: Systematic code review checklist.
  - git-workflow: Common git branching and rebase patterns.

Layer 2 - on demand: When the model wants to use a skill, it calls load_skill(name="pdf") and the full skill body (potentially 5-10K tokens) is injected via tool_result. Unused skills cost zero tokens.

# tool_result contains the full skill body
<skill name="pdf">
  Step 1: Use pdfplumber for extraction...
  Step 2: Handle OCR fallback when needed...
  Step 3: Structure output as Markdown table...
</skill>

Token cost comparison

Let's measure a real scenario. Assume 20 skills, each body averaging 3000 tokens. The user asks a question that probably needs no skills at all (e.g. "fix the login endpoint bug").

The SKILL.md format

Skill files use YAML frontmatter + body:

---
name: pdf
description: Process PDF files. Extract text, tables, metadata.
tags: document,parsing
---

Step 1: Use pdfplumber for extraction. Handle multi-column layouts...
Step 2: For scanned PDFs, fall back to OCR via tesseract...

The frontmatter feeds Layer 1 (name/description/tags); the body feeds Layer 2. This style is borrowed from static site generators (Jekyll, Hugo) - anyone familiar with them will recognize it immediately.

Everything in system prompt

System prompt: 60000 tokens
(20 skills x 3000 tokens each, all loaded)
x conversations: 1

Total: 60000 tokens

Two-layer architecture

System prompt: 2000 tokens
(20 descriptions x ~100 tokens each)
+ on-demand skill bodies: 0 tokens
(loaded every 5 conversations)

Total: 2000 tokens

Conversations N: 1

Saved 0%

SKILL.md (editable)

Layer 1 · injected into system prompt

Layer 2 · tool_result from load_skill

Domain knowledge loaded on demand

The "stuff everything in the system prompt" trap

The two-layer architecture

Token cost comparison

The SKILL.md format

Widget 1 · Token Economy · two architectures side by side

Widget 2 · Frontmatter Parser · extracting skill metadata

Widget 3 · Discoverability · good descriptions let the model find skills