web-eval-agent MCP — 설치 & 라이브 데모

왜 쓰나요

핵심 기능

Describe tests in plain English, not selectors and asserts
Captures screenshots, console errors, network requests
setup_browser_state persists logins so tests can run behind auth
Works in Cursor, Claude, and similar coding agents

라이브 데모

실제 사용 모습

web-eval-agent.replay ▶ 준비됨

0/0

설치

클라이언트 선택

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop → Settings → Developer → Edit Config 열기. 저장 후 앱 재시작.

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Cursor는 Claude Desktop과 동일한 mcpServers 스키마 사용. 프로젝트 설정이 전역보다 우선.

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Cline 사이드바의 MCP Servers 아이콘 클릭 후 "Edit Configuration" 선택.

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop과 같은 형식. Windsurf 재시작 후 적용.

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "web-eval-agent",
      "command": "uvx",
      "args": [
        "web-eval-agent"
      ]
    }
  ]
}

Continue는 맵이 아닌 서버 오브젝트 배열 사용.

~/.config/zed/settings.json

{
  "context_servers": {
    "web-eval-agent": {
      "command": {
        "path": "uvx",
        "args": [
          "web-eval-agent"
        ]
      }
    }
  }
}

context_servers에 추가. 저장 시 Zed가 핫 리로드.

claude mcp add web-eval-agent -- uvx web-eval-agent

한 줄 명령. claude mcp list로 확인, claude mcp remove로 제거.

사용 사례

실전 활용법: web-eval-agent

Smoke-test a web deploy with web-eval-agent

👤 Solo devs, small teams without a Playwright suite ⏱ ~10 min beginner

언제 쓸까: You deployed and want a quick 'did I break anything obvious' check.

사전 조건

Free API key from operative.sh/mcp — Sign up, copy the key
Playwright deps — npx playwright install (the MCP will prompt if missing)

흐름

Describe the test

On staging.example.com, verify I can: sign up with a new email, create a project, log out. Report what fails.✓ 복사됨

→ Pass/fail with screenshots
Dig into failures

For the failed step, show the console errors and the network request that returned 500.✓ 복사됨

→ Stack-level evidence

결과: Post-deploy confidence in 2 minutes.

함정

Test accounts clutter your prod DB — Always run against staging; if prod, use a dedicated QA account and clean up

함께 쓰기: sentry

Exploratory UX evaluation of a new flow

👤 Designers, PMs ⏱ ~20 min intermediate

언제 쓸까: You want an outside perspective on a flow without scheduling user tests.

흐름

Describe user intent, not steps

As a first-time user, try to share a project with a colleague. Note every friction point.✓ 복사됨

→ Free-form UX critique with screenshots of each confusion
Contrast with the happy path

Now do the same flow as a power user who knows the UI. How much faster? What confused the noob but not the expert?✓ 복사됨

→ Comparative friction map

결과: Cheap UX heuristics before putting real users in front of it.

Test features behind login with persisted browser state

👤 Anyone testing authenticated flows ⏱ ~15 min intermediate

언제 쓸까: Your feature requires login; you don't want the agent handling your password.

흐름

Seed the session

Call setup_browser_state opening https://app.example.com/login — I'll sign in myself.✓ 복사됨

→ Interactive browser opens; you log in; session saved
Run the test using the saved state

Test the billing settings page: load it, verify the current plan shows, try downgrading.✓ 복사됨

→ Test runs with your authenticated session

결과: Authed testing without sharing credentials with the agent.

조합

다른 MCP와 조합해 10배 효율

web-eval-agent + sentry

Run an eval, any new errors go to Sentry for post-hoc review

Run the signup eval, then check Sentry for new error events captured during that window.✓ 복사됨

web-eval-agent + playwright

Prototype with web-eval-agent, harden into Playwright for CI

Convert the working web-eval-agent test into a Playwright spec I can run in CI.✓ 복사됨

도구

이 MCP가 노출하는 것

도구	입력	언제 호출	비용
web_eval_agent	url: str, task: str, headless_browser?: bool	Any natural-language web test	LLM calls + browser time
setup_browser_state	url?: str	Once per service, to persist logged-in state	0

비용 및 제한

운영 비용

API 쿼터: Free tier from operative.sh
호출당 토큰: A full eval can be 5-20k tokens (screenshots described)
금액: Free for low volume
팁: For repetitive tests, graduate them to Playwright; use web-eval-agent for exploration

보안

권한, 시크릿, 파급범위

자격 증명 저장: operative.sh API key in env; browser state saved locally

데이터 외부 송신: Target sites + operative.sh for eval orchestration

Original project is archived/discontinued; team now at withrefresh.com. Still functional but no new features expected.

문제 해결

자주 발생하는 오류와 해결

Browser fails to launch

Install Playwright deps: npx playwright install-deps

Session keeps expiring

Some sites rotate cookies; re-run setup_browser_state. Or use Playwright's storageState for finer control

Agent misunderstands the task

Be specific: URLs, selectors or text to look for, expected outcomes

대안

web-eval-agent 다른 것과 비교

대안	언제 쓰나	단점/장점
Playwright MCP	You want scriptable, reproducible tests	You write the code
Browserbase MCP	You need cloud-hosted browsers for CI	Paid per minute

web-eval-agent