Triage a CloudWatch alarm by correlating logs, metrics, and recent deploys
Когда использовать: An alarm just fired and you want to go from 'which service, which deploy, which log line' without tabbing through the console.
Предварительные требования
- AWS credentials with CloudWatch + CloudFormation read —
aws sso loginwith a role that has ReadOnlyAccess managed policy - aws-cloudwatch-mcp server running —
uvx awslabs.cloudwatch-mcp-server— or install the bundle
Поток
-
Pull the alarm details and affected resourcesDescribe CloudWatch alarm 'prod-api-5xx-high'. What resource does it watch, what threshold, what's the current state?✓ Скопировано→ Alarm config plus state history (when it flipped)
-
Query logs around the breachRun a Logs Insights query over the /aws/ecs/prod-api log group from 10 minutes before the alarm fired until now. Find ERROR-level log lines grouped by message template.✓ Скопировано→ Top error templates with counts
-
Correlate with recent deploysList CodeDeploy deployments to the prod-api service in the last 6 hours. Does any deploy time correlate with the error spike?✓ Скопировано→ Deploy timeline lined up against error onset
Итог: A concrete hypothesis like 'deploy abc123 at 14:22 UTC correlates with 5xx onset at 14:23' with the evidence to back it.
Подводные камни
- Logs Insights queries against a big log group without a time window cost real money — Always include
@timestampbounds narrower than 1 hour; the MCP won't stop you from billing $$$ - Cross-account resources need the right credential profile — Set
AWS_PROFILEenv var per server invocation; don't assume the default profile is the one you want