Find the skill pulling your agent's performance down
使うタイミング: You feel the agent has gotten worse, not better, as you added skills.
前提条件
- Node 20+ — nvm install 20
- Skill cloned and installed — git clone https://github.com/Evol-ai/SkillCompass ~/.claude/skills/SkillCompass; npm i
フロー
-
Run the evaluatorScore all skills in ~/.claude/skills/ — show me the weakest link.✓ コピーしました→ Ranked skill list with per-dimension scores
-
Diagnose the loserFor the weakest skill, what specifically is wrong?✓ コピーしました→ Concrete critique (vague description, conflicting with other skill, etc.)
-
Propose a fixSuggest a minimal edit to SKILL.md to fix it.✓ コピーしました→ Small, reviewable diff
-
Re-evaluateRe-run the eval and show before/after.✓ コピーしました→ Metrics improved, with evidence
結果: A measurably better skill bundle, with a reproducible eval process.
注意点
- Gaming the eval metric instead of helping real tasks — Include task-level downstream metrics (actual agent outcomes), not just text-level