web-scraper (Claude Skill) — 安裝 & 即時演示

為什麼要用

核心特性

Phased reconnaissance — APIs > sitemaps > static HTML > browser
Template picker: Cheerio for static, Playwright for JS-heavy
Apify Actor TypeScript-first workflow
Selective stealth: only when protection signals detected
Session recording + replay for debugging

即時演示

實際使用效果

web-scraper-skill.replay ▶ 就緒

0/0

安裝

選擇你的客戶端

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "web-scraper-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/yfe404/web-scraper",
        "~/.claude/skills/web-scraper"
      ],
      "_inferred": true
    }
  }
}

開啟 Claude Desktop → Settings → Developer → Edit Config。儲存後重啟應用。

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "web-scraper-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/yfe404/web-scraper",
        "~/.claude/skills/web-scraper"
      ],
      "_inferred": true
    }
  }
}

Cursor 使用與 Claude Desktop 相同的 mcpServers 格式。專案級設定優先於全域。

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "web-scraper-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/yfe404/web-scraper",
        "~/.claude/skills/web-scraper"
      ],
      "_inferred": true
    }
  }
}

點擊 Cline 側欄中的 MCP Servers 圖示，然後選 "Edit Configuration"。

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "web-scraper-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/yfe404/web-scraper",
        "~/.claude/skills/web-scraper"
      ],
      "_inferred": true
    }
  }
}

格式與 Claude Desktop 相同。重啟 Windsurf 生效。

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "web-scraper-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/yfe404/web-scraper",
        "~/.claude/skills/web-scraper"
      ]
    }
  ]
}

Continue 使用伺服器物件陣列，而非映射。

~/.config/zed/settings.json

{
  "context_servers": {
    "web-scraper-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/yfe404/web-scraper",
          "~/.claude/skills/web-scraper"
        ]
      }
    }
  }
}

加入 context_servers。Zed 儲存後熱重載。

claude mcp add web-scraper-skill -- git clone https://github.com/yfe404/web-scraper ~/.claude/skills/web-scraper

一行命令搞定。用 claude mcp list 驗證，claude mcp remove 移除。

使用場景

實戰用法： web-scraper

Scrape a static listing site into a structured dataset

👤 Data engineers pulling public data (directories, price lists, public records) ⏱ ~45 min intermediate

何時使用： You need a dataset from a public site that doesn't have an API.

前置條件

Skill installed — git clone https://github.com/yfe404/web-scraper ~/.claude/skills/web-scraper
Node 20 for Apify Actors — nvm install 20

步驟

Let the skill do recon

Use web-scraper. Target: https://example.com/listings. I want name + URL + category. Recon first — tell me the cheapest extraction path.✓ 已複製

→ Skill reports: 'sitemap.xml available, use Cheerio'
Scaffold the Apify Actor

Scaffold a TypeScript Apify Cheerio actor for that extraction.✓ 已複製

→ Actor tree + main.ts ready to run
Run and iterate

Run locally on 10 pages; tighten selectors if needed.✓ 已複製

→ Clean JSON output

結果： An Apify Actor you can deploy for scheduled scrapes.

注意事項

Jumping to Playwright when Cheerio would do — Trust the recon — headful browsers 10x the cost unnecessarily

搭配使用： apify · filesystem

Discover and use a site's undocumented JSON API instead of HTML

👤 Scraper devs who want reliability ⏱ ~30 min intermediate

何時使用： The page is a SPA and HTML is gross, but the XHR calls are clean JSON.

步驟

Run API-discovery phase

Use web-scraper phase 1 — API discovery on https://example.com/app. Enumerate XHR/fetch endpoints.✓ 已複製

→ List of endpoints with observed payloads
Build the JSON-based actor

Generate an actor that hits those endpoints directly with auth as needed.✓ 已複製

→ Lightweight fetch-based actor

結果： A far more stable scrape than HTML parsing.

注意事項

Private/session-auth APIs that break when token rotates — Plan token-refresh logic or fall back to browser flow

組合

與其他 MCP 搭配，撬動十倍槓桿

web-scraper-skill + apify

Deploy the scaffolded actor to Apify for scheduled runs

Deploy this actor to my Apify account and schedule it daily.✓ 已複製

web-scraper-skill + filesystem

Keep the actor code in-repo alongside the consuming app

Scaffold into scrapers/ and commit with the main project.✓ 已複製

工具

此 MCP 暴露的能力

工具	輸入參數	何時呼叫
recon	url	Always first
scaffold_actor	template (cheerio\|playwright), target	After recon picks template
record_session	url	Debugging dynamic sites
run_local	actor path, limit	Iteration phase

成本與限制

運行它的成本

API 配額: Apify has its own compute + proxy quotas
每次呼叫 Token 數: Moderate — scaffold and iteration loops
費用: Free skill; Apify costs separate
提示: Prefer Cheerio — one of the cheapest run profiles on Apify

安全

權限、密鑰、影響範圍

最小權限： Apify API token with actor:read + actor:write

憑證儲存： APIFY_TOKEN in env

資料出站： Whatever sites you target + Apify platform

Respect robots.txt and each site's Terms of Service. Skill is not an anti-ToS tool — use it on data you're allowed to collect.
Public, non-authenticated data only unless you have explicit permission.

故障排查

常見錯誤與修復

Cheerio returns empty selectors

Content is JS-rendered — rerun recon, expect Playwright template

Playwright times out

Bump navigation timeout; consider waiting for specific selector instead of networkidle

403 / bot-block page

Stop and reconsider. This is the legit signal to re-check ToS, not a cue to escalate stealth.

替代方案

web-scraper 對比其他方案

替代方案	何時用它替代	權衡
Direct Apify console	You already know which template you need	No recon phase
firecrawl	You just need markdown of a page, not structured extraction	No actor scaffolding

web-scraper

為什麼要用

核心特性

即時演示

實際使用效果

安裝

選擇你的客戶端

使用場景

實戰用法： web-scraper

Scrape a static listing site into a structured dataset

前置條件

步驟

注意事項

Discover and use a site's undocumented JSON API instead of HTML

步驟

注意事項

組合

與其他 MCP 搭配，撬動十倍槓桿

工具

此 MCP 暴露的能力

成本與限制

運行它的成本

安全

權限、密鑰、影響範圍

故障排查

常見錯誤與修復

替代方案

web-scraper 對比其他方案

更多

資源