Domain knowledge loaded on demand
"Don't put everything in the system prompt. Load on demand."
The "stuff everything in the system prompt" trap
You have 20 skills, each pretty detailed: pdf-processing (how to read PDFs), code-review (a review checklist), git-workflow (common git patterns)... The intuitive move: concatenate all of them into the system prompt so the model can look them up at any time.
The result:
- Every call burns 15-30K input tokens, even when the task uses none of the skills.
- Model attention is diluted - compliance with rules buried in a long system prompt degrades.
- Change one skill and every cached conversation is invalidated.
s05's approach: split it into two layers.
The two-layer architecture
Layer 1 - cheap: The system prompt holds only the skill name and a one-sentence description (~100 tokens each). 20 skills = 2K tokens. Acceptable.
# Skill registry in the system prompt
Skills available:
- pdf: Process PDF files. Extract text, tables, metadata.
- code-review: Systematic code review checklist.
- git-workflow: Common git branching and rebase patterns.
Layer 2 - on demand: When the model wants to use a skill, it calls load_skill(name="pdf") and the full skill body (potentially 5-10K tokens) is injected via tool_result. Unused skills cost zero tokens.
# tool_result contains the full skill body
<skill name="pdf">
Step 1: Use pdfplumber for extraction...
Step 2: Handle OCR fallback when needed...
Step 3: Structure output as Markdown table...
</skill>
Token cost comparison
Let's measure a real scenario. Assume 20 skills, each body averaging 3000 tokens. The user asks a question that probably needs no skills at all (e.g. "fix the login endpoint bug").
The SKILL.md format
Skill files use YAML frontmatter + body:
--- name: pdf description: Process PDF files. Extract text, tables, metadata. tags: document,parsing --- Step 1: Use pdfplumber for extraction. Handle multi-column layouts... Step 2: For scanned PDFs, fall back to OCR via tesseract...
The frontmatter feeds Layer 1 (name/description/tags); the body feeds Layer 2. This style is borrowed from static site generators (Jekyll, Hugo) - anyone familiar with them will recognize it immediately.