Triage a CloudWatch alarm by correlating logs, metrics, and recent deploys
When to use: An alarm just fired and you want to go from 'which service, which deploy, which log line' without tabbing through the console.
Prerequisites
- AWS credentials with CloudWatch + CloudFormation read —
aws sso loginwith a role that has ReadOnlyAccess managed policy - aws-cloudwatch-mcp server running —
uvx awslabs.cloudwatch-mcp-server— or install the bundle
Flow
-
Pull the alarm details and affected resourcesDescribe CloudWatch alarm 'prod-api-5xx-high'. What resource does it watch, what threshold, what's the current state?✓ Copied→ Alarm config plus state history (when it flipped)
-
Query logs around the breachRun a Logs Insights query over the /aws/ecs/prod-api log group from 10 minutes before the alarm fired until now. Find ERROR-level log lines grouped by message template.✓ Copied→ Top error templates with counts
-
Correlate with recent deploysList CodeDeploy deployments to the prod-api service in the last 6 hours. Does any deploy time correlate with the error spike?✓ Copied→ Deploy timeline lined up against error onset
Outcome: A concrete hypothesis like 'deploy abc123 at 14:22 UTC correlates with 5xx onset at 14:23' with the evidence to back it.
Pitfalls
- Logs Insights queries against a big log group without a time window cost real money — Always include
@timestampbounds narrower than 1 hour; the MCP won't stop you from billing $$$ - Cross-account resources need the right credential profile — Set
AWS_PROFILEenv var per server invocation; don't assume the default profile is the one you want