Find the skill pulling your agent's performance down
何时使用: You feel the agent has gotten worse, not better, as you added skills.
前置条件
- Node 20+ — nvm install 20
- Skill cloned and installed — git clone https://github.com/Evol-ai/SkillCompass ~/.claude/skills/SkillCompass; npm i
步骤
-
Run the evaluatorScore all skills in ~/.claude/skills/ — show me the weakest link.✓ 已复制→ Ranked skill list with per-dimension scores
-
Diagnose the loserFor the weakest skill, what specifically is wrong?✓ 已复制→ Concrete critique (vague description, conflicting with other skill, etc.)
-
Propose a fixSuggest a minimal edit to SKILL.md to fix it.✓ 已复制→ Small, reviewable diff
-
Re-evaluateRe-run the eval and show before/after.✓ 已复制→ Metrics improved, with evidence
结果: A measurably better skill bundle, with a reproducible eval process.
注意事项
- Gaming the eval metric instead of helping real tasks — Include task-level downstream metrics (actual agent outcomes), not just text-level