87 ML-research skills covering training, fine-tuning, distributed systems, inference, and paper writing — Claude becomes a credible ML infra collaborator.
A curated library of Agent Skills for AI research and engineering. Each skill (vLLM, DeepSpeed, Axolotl, TRL, Flash Attention, Unsloth, LLaMA-Factory, etc.) ships a SKILL.md with 50-150 line quick-refs plus 300KB+ of primary references. An autoresearch orchestrator skill routes between them for end-to-end experimentation.
Set up a vLLM server with correct tensor-parallelism for your GPUs
👤 Infra engineers serving LLMs in production⏱ ~30 minadvanced
Когда использовать: You have 2-8 GPUs and want Claude to pick the right --tensor-parallel-size and --max-model-len.
Поток
State the hardware and model
Use the vllm skill. Serve Qwen2.5-72B on 4x H100s. Give me the exact launch command and sanity tests.✓ Скопировано
→ Correct TP size and quantization recommendation
Ask for load-test script
Now give me a locust or vllm-benchmark script to verify throughput.✓ Скопировано
→ Runnable benchmark using the right endpoint format
Итог: A vLLM deployment with sanity checks and a benchmark baseline.
Подводные камни
Claude picks a TP size that doesn't divide the attention heads — The vLLM skill references list valid TP sizes per model family — have Claude cite that