How to diagnose a latency spike with Prometheus + Claude
Когда использовать: A service p99 alert fires — you need context without memorizing PromQL.
Предварительные требования
- Prometheus URL reachable — Set PROMETHEUS_URL in the MCP config; add auth if protected
Поток
-
Scope the spikeQuery http request p99 latency for service X in the last hour, 30-second resolution. Compare to the last 7 days baseline.✓ Скопировано→ Range query result showing the spike
-
Find correlated metricsFor the spike window, what other metrics for service X moved >2 sigma? CPU, memory, GC, queue depth?✓ Скопировано→ Candidate culprit metrics
-
Narrow by labelBreak down the spike by pod/host labels. Is it one pod or fleet-wide?✓ Скопировано→ Per-label decomposition
Итог: A hypothesis tied to specific metrics in under 5 minutes.
Подводные камни
- Query returns no data — Check label names with
list_metrics— label casing and delimiters vary between exporters