Enforce per-team token quotas across Azure OpenAI deployments
Wann einsetzen: Multiple product teams share AOAI; one team's runaway loop shouldn't burn the shared TPM budget.
Voraussetzungen
- APIM instance with the AI-Gateway patterns applied — Deploy the reference architecture from the Azure-Samples/AI-Gateway repo
- APIM subscription key per team — Each team gets a distinct APIM subscription (key) they include in the Ocp-Apim-Subscription-Key header
Ablauf
-
Review current quotasList APIM subscriptions with their current TPM and RPM quotas for the AOAI product.✓ Kopiert→ Per-team quota table
-
Adjust a noisy team downTeam 'growth' is at 90% TPM burn daily. Reduce their quota from 200k → 100k TPM. Keep others unchanged.✓ Kopiert→ Quota updated; confirmation
-
Monitor after the changeOver the next hour, pull 429 (rate-limited) counts per subscription. Confirm growth is being shaped but prod-critical teams aren't affected.✓ Kopiert→ Enforcement visible in metrics
Ergebnis: Controlled shared AOAI spend without nuking legit high-priority traffic.
Fallstricke
- Setting quotas too low starves legitimate workloads — Roll out in shadow mode first (log-only), then enforce once you understand real patterns