Enforce per-team token quotas across Azure OpenAI deployments
Когда использовать: Multiple product teams share AOAI; one team's runaway loop shouldn't burn the shared TPM budget.
Предварительные требования
- APIM instance with the AI-Gateway patterns applied — Deploy the reference architecture from the Azure-Samples/AI-Gateway repo
- APIM subscription key per team — Each team gets a distinct APIM subscription (key) they include in the Ocp-Apim-Subscription-Key header
Поток
-
Review current quotasList APIM subscriptions with their current TPM and RPM quotas for the AOAI product.✓ Скопировано→ Per-team quota table
-
Adjust a noisy team downTeam 'growth' is at 90% TPM burn daily. Reduce their quota from 200k → 100k TPM. Keep others unchanged.✓ Скопировано→ Quota updated; confirmation
-
Monitor after the changeOver the next hour, pull 429 (rate-limited) counts per subscription. Confirm growth is being shaped but prod-critical teams aren't affected.✓ Скопировано→ Enforcement visible in metrics
Итог: Controlled shared AOAI spend without nuking legit high-priority traffic.
Подводные камни
- Setting quotas too low starves legitimate workloads — Roll out in shadow mode first (log-only), then enforce once you understand real patterns