How to diagnose a latency spike with Prometheus + Claude
언제 쓸까: A service p99 alert fires — you need context without memorizing PromQL.
사전 조건
- Prometheus URL reachable — Set PROMETHEUS_URL in the MCP config; add auth if protected
흐름
-
Scope the spikeQuery http request p99 latency for service X in the last hour, 30-second resolution. Compare to the last 7 days baseline.✓ 복사됨→ Range query result showing the spike
-
Find correlated metricsFor the spike window, what other metrics for service X moved >2 sigma? CPU, memory, GC, queue depth?✓ 복사됨→ Candidate culprit metrics
-
Narrow by labelBreak down the spike by pod/host labels. Is it one pod or fleet-wide?✓ 복사됨→ Per-label decomposition
결과: A hypothesis tied to specific metrics in under 5 minutes.
함정
- Query returns no data — Check label names with
list_metrics— label casing and delimiters vary between exporters