/ 目录 / 演练场 / data-engineering-skills
● 社区 AltimateAI ⚡ 即开即用

data-engineering-skills

作者 AltimateAI · AltimateAI/data-engineering-skills

9 Claude Code skills for analytics engineering: 7 dbt workflows + 2 Snowflake query optimizers. 53% pass on real dbt tasks, 84% on Snowflake tuning.

Skills for the daily grind of analytics engineering. dbt skills cover creating, debugging, testing, documenting, migrating, refactoring, and incremental models. Snowflake skills find expensive queries and optimize either by text or by query_id. Philosophy: 'Read before you write. Build after you write. Verify your output.'

为什么要用

核心特性

实时演示

实际使用效果

data-engineering-skill.replay ▶ 就绪
0/0

安装

选择你的客户端

~/Library/Application Support/Claude/claude_desktop_config.json  · Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

打开 Claude Desktop → Settings → Developer → Edit Config。保存后重启应用。

~/.cursor/mcp.json · .cursor/mcp.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Cursor 使用与 Claude Desktop 相同的 mcpServers 格式。项目级配置优先于全局。

VS Code → Cline → MCP Servers → Edit
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

点击 Cline 侧栏中的 MCP Servers 图标,然后选 "Edit Configuration"。

~/.codeium/windsurf/mcp_config.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

格式与 Claude Desktop 相同。重启 Windsurf 生效。

~/.continue/config.json
{
  "mcpServers": [
    {
      "name": "data-engineering-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ]
    }
  ]
}

Continue 使用服务器对象数组,而非映射。

~/.config/zed/settings.json
{
  "context_servers": {
    "data-engineering-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/AltimateAI/data-engineering-skills",
          "~/.claude/skills/data-engineering-skills"
        ]
      }
    }
  }
}

加入 context_servers。Zed 保存后热重载。

claude mcp add data-engineering-skill -- git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

一行命令搞定。用 claude mcp list 验证,claude mcp remove 卸载。

使用场景

实战用法: data-engineering-skills

Debug a failing dbt model without thrashing

👤 Analytics engineers facing a red CI run ⏱ ~20 min intermediate

何时使用: dbt run just failed with a cryptic error and you don't know if it's schema, lineage, or SQL.

前置条件
  • dbt project accessible — cd into your dbt repo so Claude can see models/
  • Skill installed — git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills
步骤
  1. Feed Claude the error + model
    Use debugging-dbt-errors. Here's the stderr and models/marts/fct_orders.sql. Diagnose the root cause — don't guess.✓ 已复制
    → Claude reads upstream refs, diagnoses in order: schema → lineage → SQL
  2. Apply the fix and verify
    Apply the fix and run dbt build --select fct_orders+. Show me the before/after row counts.✓ 已复制
    → Clean run + row count verification

结果: Green CI plus a note of the root cause so it doesn't recur.

注意事项
  • Fixing a symptom downstream when the bug is upstream — The skill enforces an upstream-first diagnosis; don't skip the lineage step
搭配使用: bigquery-server · github

Find and fix your top expensive Snowflake queries

👤 Analytics leads with a climbing Snowflake bill ⏱ ~60 min intermediate

何时使用: Finance flagged the Snowflake bill and you need to cut it without breaking dashboards.

前置条件
  • Snowflake role with ACCOUNT_USAGE access — ACCOUNTADMIN typically, or a dedicated cost role
步骤
  1. Identify worst offenders
    Use finding-expensive-queries to list the top 20 queries in the past 30 days by credit cost. Group by app/user.✓ 已复制
    → Ranked table with credits, runtime, warehouse
  2. Optimize each top one
    For the top offender, use optimizing-query-by-id <query_id>. Propose rewrites with estimated savings.✓ 已复制
    → Rewritten SQL + before/after explain plan
  3. Validate and deploy
    Run the rewrite in a test warehouse — confirm same row count and shape before we swap.✓ 已复制
    → Safe swap candidate

结果: A prioritized list of fixes with measurable $ savings.

注意事项
  • Rewrites change row count silently — Always diff before deploying — the skill enforces this
搭配使用: bigquery-server

Migrate a pile of stored procs into dbt models

👤 Teams moving off legacy SQL to dbt ⏱ ~90 min advanced

何时使用: You've inherited a warehouse of nested CTEs and want them as documented, tested dbt models.

步骤
  1. Point the skill at the source SQL
    Use migrating-sql-to-dbt. Here's proc_monthly_revenue.sql. Convert it to dbt models with refs, documentation, and at least 2 tests per model.✓ 已复制
    → One or more .sql files, schema.yml with docs and tests
  2. Build and verify
    dbt build the new models and compare row counts to the legacy output.✓ 已复制
    → Row counts match within tolerance

结果: Legacy logic lives as testable dbt models.

注意事项
  • Hidden side effects in the proc (UPDATEs) — The skill flags side effects — separate them out, don't blindly convert
搭配使用: github

Convert a slow full-refresh model to incremental

👤 Analytics engineers with long-running dbt runs ⏱ ~45 min advanced

何时使用: A daily model has grown too big for full refresh.

步骤
  1. Analyze the model
    Use developing-incremental-models on models/events.sql. Pick a strategy (merge / insert_overwrite / delete+insert) and justify.✓ 已复制
    → Strategy + unique_key + partition / cluster keys recommended
  2. Implement and back-fill
    Apply the incremental config; outline a safe back-fill plan.✓ 已复制
    → Model + back-fill steps

结果: Daily runs that finish in minutes, not hours.

注意事项
  • unique_key gets duplicates on late data — Use merge and test it

组合

与其他 MCP 搭配,撬动十倍杠杆

data-engineering-skill + bigquery-server

Apply the same optimize-by-id pattern to BigQuery expensive queries

Adapt finding-expensive-queries for BigQuery INFORMATION_SCHEMA.JOBS and list top 20.✓ 已复制
data-engineering-skill + github

Open a PR per migrated model so each is reviewable

For every migrated model, open a GitHub PR with dbt test output attached.✓ 已复制

工具

此 MCP 暴露的能力

工具输入参数何时调用成本
creating-dbt-models model spec New model 0
debugging-dbt-errors error log, model CI or local run failed 0
testing-dbt-models model Untested model 0
documenting-dbt-models model Undocumented model 0
migrating-sql-to-dbt legacy SQL Legacy migration 0
refactoring-dbt-models model Hard-to-read model 0
developing-incremental-models full-refresh model Runtime too long 0
finding-expensive-queries lookback window Cost hunt ACCOUNT_USAGE query
optimizing-query-text SQL text Know the SQL, not the id 0
optimizing-query-by-id query_id Have the id from the UI 1 explain

成本与限制

运行它的成本

API 配额
Snowflake queries cost credits like any other — ACCOUNT_USAGE reads are cheap
每次调用 Token 数
5–15k per dbt skill invocation
费用
Free skill
提示
Run finding-expensive-queries once weekly, not on every session

安全

权限、密钥、影响范围

最小权限: dbt: read + write to your project Snowflake: ACCOUNT_USAGE for cost skills
凭据存储: dbt profiles.yml / Snowflake key-pair in env; the skill doesn't store secrets
数据出站: None from the skill directly
切勿授予: SYSADMIN to the Claude session unless absolutely needed

故障排查

常见错误与修复

dbt compile succeeds, run fails with column not found

Stale lineage — dbt deps + dbt clean + dbt build --select model+

finding-expensive-queries returns nothing

ACCOUNT_USAGE has ~45min delay; also confirm role has SNOWFLAKE.ACCOUNT_USAGE

验证: SHOW GRANTS TO ROLE <role>

替代方案

data-engineering-skills 对比其他方案

替代方案何时用它替代权衡
dbt Cloud IDEYou prefer managed UI over terminalNo Claude in the loop
SQL query optimizers (Select.dev, etc.)You want visual query plansSeparate tool, separate context

更多

资源

📖 阅读 GitHub 上的官方 README

🐙 查看未解决的 issue

🔍 浏览全部 400+ MCP 服务器和 Skills