/ Каталог / Песочница / Azure AI Gateway
● Официальный Azure-Samples 🔑 Нужен свой ключ

Azure AI Gateway

автор Azure-Samples · Azure-Samples/AI-Gateway

Microsoft's APIM-based AI Gateway patterns — route, meter, and govern LLM traffic (including MCP) from Azure API Management.

Azure AI Gateway is a reference-implementation repo from Microsoft showing how to put Azure API Management (APIM) in front of LLM/MCP endpoints for auth, quota, caching, routing, logging, and circuit-breaking. The MCP exposes these gateway operations so an agent can configure and inspect them.

Зачем использовать

Ключевые функции

Живое демо

Как выглядит на практике

azure-ai-gateway.replay ▶ готово
0/0

Установка

Выберите клиент

~/Library/Application Support/Claude/claude_desktop_config.json  · Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "azure-ai-gateway": {
      "command": "uvx",
      "args": [
        "azure-ai-gateway-mcp"
      ]
    }
  }
}

Откройте Claude Desktop → Settings → Developer → Edit Config. Перезапустите после сохранения.

~/.cursor/mcp.json · .cursor/mcp.json
{
  "mcpServers": {
    "azure-ai-gateway": {
      "command": "uvx",
      "args": [
        "azure-ai-gateway-mcp"
      ]
    }
  }
}

Cursor использует ту же схему mcpServers, что и Claude Desktop. Конфиг проекта приоритетнее глобального.

VS Code → Cline → MCP Servers → Edit
{
  "mcpServers": {
    "azure-ai-gateway": {
      "command": "uvx",
      "args": [
        "azure-ai-gateway-mcp"
      ]
    }
  }
}

Щёлкните значок MCP Servers на боковой панели Cline, затем "Edit Configuration".

~/.codeium/windsurf/mcp_config.json
{
  "mcpServers": {
    "azure-ai-gateway": {
      "command": "uvx",
      "args": [
        "azure-ai-gateway-mcp"
      ]
    }
  }
}

Тот же формат, что и Claude Desktop. Перезапустите Windsurf для применения.

~/.continue/config.json
{
  "mcpServers": [
    {
      "name": "azure-ai-gateway",
      "command": "uvx",
      "args": [
        "azure-ai-gateway-mcp"
      ]
    }
  ]
}

Continue использует массив объектов серверов, а не map.

~/.config/zed/settings.json
{
  "context_servers": {
    "azure-ai-gateway": {
      "command": {
        "path": "uvx",
        "args": [
          "azure-ai-gateway-mcp"
        ]
      }
    }
  }
}

Добавьте в context_servers. Zed перезагружается автоматически.

claude mcp add azure-ai-gateway -- uvx azure-ai-gateway-mcp

Однострочная команда. Проверить: claude mcp list. Удалить: claude mcp remove.

Сценарии использования

Реальные сценарии: Azure AI Gateway

Enforce per-team token quotas across Azure OpenAI deployments

👤 Central platform teams governing LLM spend ⏱ ~30 min advanced

Когда использовать: Multiple product teams share AOAI; one team's runaway loop shouldn't burn the shared TPM budget.

Предварительные требования
  • APIM instance with the AI-Gateway patterns applied — Deploy the reference architecture from the Azure-Samples/AI-Gateway repo
  • APIM subscription key per team — Each team gets a distinct APIM subscription (key) they include in the Ocp-Apim-Subscription-Key header
Поток
  1. Review current quotas
    List APIM subscriptions with their current TPM and RPM quotas for the AOAI product.✓ Скопировано
    → Per-team quota table
  2. Adjust a noisy team down
    Team 'growth' is at 90% TPM burn daily. Reduce their quota from 200k → 100k TPM. Keep others unchanged.✓ Скопировано
    → Quota updated; confirmation
  3. Monitor after the change
    Over the next hour, pull 429 (rate-limited) counts per subscription. Confirm growth is being shaped but prod-critical teams aren't affected.✓ Скопировано
    → Enforcement visible in metrics

Итог: Controlled shared AOAI spend without nuking legit high-priority traffic.

Подводные камни
  • Setting quotas too low starves legitimate workloads — Roll out in shadow mode first (log-only), then enforce once you understand real patterns

Configure multi-region failover for an Azure OpenAI deployment

👤 SREs running production AI workloads ⏱ ~45 min advanced

Когда использовать: A regional AOAI outage (uncommon but real) should fail over transparently to another region.

Предварительные требования
  • AOAI deployments in ≥2 regions (e.g. East US, West Europe) — Provision via Azure portal; match model + version
Поток
  1. Inspect current backend pool
    Show the APIM backend pool for our AOAI API. How many backends, priority, circuit-breaker config?✓ Скопировано
    → Current pool config
  2. Add a secondary region
    Add the West Europe AOAI endpoint as priority=2 with circuit-breaker: 5 failures in 1 min → open for 5 min. Keep East US as primary.✓ Скопировано
    → Pool updated, 2 backends configured
  3. Test failover
    Simulate primary outage by disabling the East US backend for 2 min. Confirm traffic shifts to West Europe, then rollback.✓ Скопировано
    → Traffic shift observed; rollback verified

Итог: Transparent failover with evidence it works before you need it.

Подводные камни
  • Different regions have different deployed model versions — Pin to a model version that exists in both regions; mismatched versions silently return different quality

Deploy semantic caching to reduce repeat prompt costs

👤 Cost-conscious platform teams ⏱ ~30 min advanced

Когда использовать: Your users ask similar questions over and over; 30–60% of calls are effectively cache hits.

Поток
  1. Turn on semantic cache policy
    Enable the APIM semantic-cache-lookup policy on the AOAI completions API with similarity threshold 0.95, TTL 1h.✓ Скопировано
    → Policy applied
  2. Observe hit rate
    After 24h, pull cache hit rate and token savings from App Insights.✓ Скопировано
    → Hit rate % + tokens saved
  3. Tune threshold
    If hit rate <20%, lower threshold to 0.92 and observe again. If quality complaints, raise back to 0.97.✓ Скопировано
    → Iterative tuning with measurements

Итог: Measured cost savings on repeat queries without degrading output quality.

Подводные камни
  • Over-aggressive caching serves wrong answers for similar-but-different questions — Start high (0.97) and only lower based on observed quality

Комбинации

Сочетайте с другими MCP — эффект x10

azure-ai-gateway + sentry

Correlate APIM 5xx spikes with application-side errors

If Sentry shows 5xx spike in app X at 10:02, pull APIM metrics for the same window and identify if the gateway caused it.✓ Скопировано
azure-ai-gateway + notion

Weekly AI-traffic governance report to Notion

Compile per-team TPM usage for the week, 429 counts, cache hit rate; post as a Notion page.✓ Скопировано

Инструменты

Что предоставляет этот MCP

ИнструментВходные данныеКогда вызыватьСтоимость
list_subscriptions product_id? Inventory teams consuming the gateway free (ARM API call)
update_quota subscription_id, tpm?, rpm? Adjust a team's token/request limits free
get_backend_pool api_id Inspect routing and failover config free
update_backend_pool api_id, backends, policies Change priorities, circuit breakers, load balancing free
apply_policy api_id, policy_xml Deploy APIM policy (cache, auth, logging) free
get_metrics api_id, since, until Observe traffic shape per API free

Стоимость и лимиты

Во что обходится

Квота API
Azure Resource Manager rate limits (generous per tenant)
Токенов на вызов
Policy/backend-pool reads: 500–2000 tokens
Деньги
APIM pricing starts at ~$40/mo (Basic tier); Standard tier recommended for prod
Совет
Semantic caching usually pays for APIM's cost many times over if your traffic repeats. Measure hit rate to justify.

Безопасность

Права, секреты, радиус поражения

Минимальные скоупы: APIM Contributor on the target APIM instance
Хранение учётных данных: Azure service principal credentials (client id/secret/tenant) in env
Исходящий трафик: ARM API calls to management.azure.com; prompt/response bodies traverse APIM itself
Никогда не давайте: Owner on the subscription Global admin

Устранение неполадок

Частые ошибки и исправления

401 on ARM API calls

Service principal lacks APIM Contributor role on the resource group. Grant via portal or az cli.

Проверить: az role assignment list --assignee <sp>
Policy apply fails — XML validation error

APIM policy XML is strict; use the portal's policy editor to validate, then copy-paste.

429s persist after raising TPM quota

Underlying AOAI deployment itself may be the bottleneck. Check deployment TPM, not just APIM subscription TPM.

Semantic cache hit rate is 0%

Embedding backend for cache-lookup not configured; check the cache policy's embeddings reference.

Альтернативы

Azure AI Gateway в сравнении

АльтернативаКогда использоватьКомпромисс
Cloudflare AI GatewayYou're on Cloudflare and want multi-provider LLM routing out of the boxLess deep integration with Azure-hosted models
Portkey / LiteLLMYou want a provider-agnostic gateway with a dashboardThird-party SaaS; data leaves your cloud

Ещё

Ресурсы

📖 Читать официальный README на GitHub

🐙 Открытые задачи

🔍 Все 400+ MCP-серверов и Skills