/ Каталог / Песочница / web-eval-agent
● Сообщество refreshdotdev ⚡ Сразу

web-eval-agent

автор refreshdotdev · refreshdotdev/web-eval-agent

Natural-language end-to-end tests for web apps — the agent drives a browser, captures console/network, and writes up findings.

web-eval-agent (refreshdotdev) lets you describe a user task in plain English; the MCP opens a browser, performs the flow, and reports screenshots, console logs, and network traffic. Useful for exploratory UX testing without writing Playwright scripts. Note: the original project is discontinued — the team has moved on to withrefresh.com — but the MCP remains functional under its existing license.

Зачем использовать

Ключевые функции

Живое демо

Как выглядит на практике

web-eval-agent.replay ▶ готово
0/0

Установка

Выберите клиент

~/Library/Application Support/Claude/claude_desktop_config.json  · Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Откройте Claude Desktop → Settings → Developer → Edit Config. Перезапустите после сохранения.

~/.cursor/mcp.json · .cursor/mcp.json
{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Cursor использует ту же схему mcpServers, что и Claude Desktop. Конфиг проекта приоритетнее глобального.

VS Code → Cline → MCP Servers → Edit
{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Щёлкните значок MCP Servers на боковой панели Cline, затем "Edit Configuration".

~/.codeium/windsurf/mcp_config.json
{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Тот же формат, что и Claude Desktop. Перезапустите Windsurf для применения.

~/.continue/config.json
{
  "mcpServers": [
    {
      "name": "web-eval-agent",
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ]
    }
  ]
}

Continue использует массив объектов серверов, а не map.

~/.config/zed/settings.json
{
  "context_servers": {
    "web-eval-agent": {
      "command": {
        "path": "uvx",
        "args": [
          "web-eval-agent"
        ]
      }
    }
  }
}

Добавьте в context_servers. Zed перезагружается автоматически.

claude mcp add web-eval-agent -- uvx web-eval-agent

Однострочная команда. Проверить: claude mcp list. Удалить: claude mcp remove.

Сценарии использования

Реальные сценарии: web-eval-agent

Smoke-test a web deploy with web-eval-agent

👤 Solo devs, small teams without a Playwright suite ⏱ ~10 min beginner

Когда использовать: You deployed and want a quick 'did I break anything obvious' check.

Предварительные требования
  • Free API key from operative.sh/mcp — Sign up, copy the key
  • Playwright deps — npx playwright install (the MCP will prompt if missing)
Поток
  1. Describe the test
    On staging.example.com, verify I can: sign up with a new email, create a project, log out. Report what fails.✓ Скопировано
    → Pass/fail with screenshots
  2. Dig into failures
    For the failed step, show the console errors and the network request that returned 500.✓ Скопировано
    → Stack-level evidence

Итог: Post-deploy confidence in 2 minutes.

Подводные камни
  • Test accounts clutter your prod DB — Always run against staging; if prod, use a dedicated QA account and clean up
Сочетать с: sentry

Exploratory UX evaluation of a new flow

👤 Designers, PMs ⏱ ~20 min intermediate

Когда использовать: You want an outside perspective on a flow without scheduling user tests.

Поток
  1. Describe user intent, not steps
    As a first-time user, try to share a project with a colleague. Note every friction point.✓ Скопировано
    → Free-form UX critique with screenshots of each confusion
  2. Contrast with the happy path
    Now do the same flow as a power user who knows the UI. How much faster? What confused the noob but not the expert?✓ Скопировано
    → Comparative friction map

Итог: Cheap UX heuristics before putting real users in front of it.

Test features behind login with persisted browser state

👤 Anyone testing authenticated flows ⏱ ~15 min intermediate

Когда использовать: Your feature requires login; you don't want the agent handling your password.

Поток
  1. Seed the session
    Call setup_browser_state opening https://app.example.com/login — I'll sign in myself.✓ Скопировано
    → Interactive browser opens; you log in; session saved
  2. Run the test using the saved state
    Test the billing settings page: load it, verify the current plan shows, try downgrading.✓ Скопировано
    → Test runs with your authenticated session

Итог: Authed testing without sharing credentials with the agent.

Комбинации

Сочетайте с другими MCP — эффект x10

web-eval-agent + sentry

Run an eval, any new errors go to Sentry for post-hoc review

Run the signup eval, then check Sentry for new error events captured during that window.✓ Скопировано
web-eval-agent + playwright

Prototype with web-eval-agent, harden into Playwright for CI

Convert the working web-eval-agent test into a Playwright spec I can run in CI.✓ Скопировано

Инструменты

Что предоставляет этот MCP

ИнструментВходные данныеКогда вызыватьСтоимость
web_eval_agent url: str, task: str, headless_browser?: bool Any natural-language web test LLM calls + browser time
setup_browser_state url?: str Once per service, to persist logged-in state 0

Стоимость и лимиты

Во что обходится

Квота API
Free tier from operative.sh
Токенов на вызов
A full eval can be 5-20k tokens (screenshots described)
Деньги
Free for low volume
Совет
For repetitive tests, graduate them to Playwright; use web-eval-agent for exploration

Безопасность

Права, секреты, радиус поражения

Хранение учётных данных: operative.sh API key in env; browser state saved locally
Исходящий трафик: Target sites + operative.sh for eval orchestration

Устранение неполадок

Частые ошибки и исправления

Browser fails to launch

Install Playwright deps: npx playwright install-deps

Session keeps expiring

Some sites rotate cookies; re-run setup_browser_state. Or use Playwright's storageState for finer control

Agent misunderstands the task

Be specific: URLs, selectors or text to look for, expected outcomes

Альтернативы

web-eval-agent в сравнении

АльтернативаКогда использоватьКомпромисс
Playwright MCPYou want scriptable, reproducible testsYou write the code
Browserbase MCPYou need cloud-hosted browsers for CIPaid per minute

Ещё

Ресурсы

📖 Читать официальный README на GitHub

🐙 Открытые задачи

🔍 Все 400+ MCP-серверов и Skills