/ Каталог / Песочница / data-engineering-skills
● Сообщество AltimateAI ⚡ Сразу

data-engineering-skills

автор AltimateAI · AltimateAI/data-engineering-skills

9 Claude Code skills for analytics engineering: 7 dbt workflows + 2 Snowflake query optimizers. 53% pass on real dbt tasks, 84% on Snowflake tuning.

Skills for the daily grind of analytics engineering. dbt skills cover creating, debugging, testing, documenting, migrating, refactoring, and incremental models. Snowflake skills find expensive queries and optimize either by text or by query_id. Philosophy: 'Read before you write. Build after you write. Verify your output.'

Зачем использовать

Ключевые функции

Живое демо

Как выглядит на практике

data-engineering-skill.replay ▶ готово
0/0

Установка

Выберите клиент

~/Library/Application Support/Claude/claude_desktop_config.json  · Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Откройте Claude Desktop → Settings → Developer → Edit Config. Перезапустите после сохранения.

~/.cursor/mcp.json · .cursor/mcp.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Cursor использует ту же схему mcpServers, что и Claude Desktop. Конфиг проекта приоритетнее глобального.

VS Code → Cline → MCP Servers → Edit
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Щёлкните значок MCP Servers на боковой панели Cline, затем "Edit Configuration".

~/.codeium/windsurf/mcp_config.json
{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Тот же формат, что и Claude Desktop. Перезапустите Windsurf для применения.

~/.continue/config.json
{
  "mcpServers": [
    {
      "name": "data-engineering-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ]
    }
  ]
}

Continue использует массив объектов серверов, а не map.

~/.config/zed/settings.json
{
  "context_servers": {
    "data-engineering-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/AltimateAI/data-engineering-skills",
          "~/.claude/skills/data-engineering-skills"
        ]
      }
    }
  }
}

Добавьте в context_servers. Zed перезагружается автоматически.

claude mcp add data-engineering-skill -- git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

Однострочная команда. Проверить: claude mcp list. Удалить: claude mcp remove.

Сценарии использования

Реальные сценарии: data-engineering-skills

Debug a failing dbt model without thrashing

👤 Analytics engineers facing a red CI run ⏱ ~20 min intermediate

Когда использовать: dbt run just failed with a cryptic error and you don't know if it's schema, lineage, or SQL.

Предварительные требования
  • dbt project accessible — cd into your dbt repo so Claude can see models/
  • Skill installed — git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills
Поток
  1. Feed Claude the error + model
    Use debugging-dbt-errors. Here's the stderr and models/marts/fct_orders.sql. Diagnose the root cause — don't guess.✓ Скопировано
    → Claude reads upstream refs, diagnoses in order: schema → lineage → SQL
  2. Apply the fix and verify
    Apply the fix and run dbt build --select fct_orders+. Show me the before/after row counts.✓ Скопировано
    → Clean run + row count verification

Итог: Green CI plus a note of the root cause so it doesn't recur.

Подводные камни
  • Fixing a symptom downstream when the bug is upstream — The skill enforces an upstream-first diagnosis; don't skip the lineage step
Сочетать с: bigquery-server · github

Find and fix your top expensive Snowflake queries

👤 Analytics leads with a climbing Snowflake bill ⏱ ~60 min intermediate

Когда использовать: Finance flagged the Snowflake bill and you need to cut it without breaking dashboards.

Предварительные требования
  • Snowflake role with ACCOUNT_USAGE access — ACCOUNTADMIN typically, or a dedicated cost role
Поток
  1. Identify worst offenders
    Use finding-expensive-queries to list the top 20 queries in the past 30 days by credit cost. Group by app/user.✓ Скопировано
    → Ranked table with credits, runtime, warehouse
  2. Optimize each top one
    For the top offender, use optimizing-query-by-id <query_id>. Propose rewrites with estimated savings.✓ Скопировано
    → Rewritten SQL + before/after explain plan
  3. Validate and deploy
    Run the rewrite in a test warehouse — confirm same row count and shape before we swap.✓ Скопировано
    → Safe swap candidate

Итог: A prioritized list of fixes with measurable $ savings.

Подводные камни
  • Rewrites change row count silently — Always diff before deploying — the skill enforces this
Сочетать с: bigquery-server

Migrate a pile of stored procs into dbt models

👤 Teams moving off legacy SQL to dbt ⏱ ~90 min advanced

Когда использовать: You've inherited a warehouse of nested CTEs and want them as documented, tested dbt models.

Поток
  1. Point the skill at the source SQL
    Use migrating-sql-to-dbt. Here's proc_monthly_revenue.sql. Convert it to dbt models with refs, documentation, and at least 2 tests per model.✓ Скопировано
    → One or more .sql files, schema.yml with docs and tests
  2. Build and verify
    dbt build the new models and compare row counts to the legacy output.✓ Скопировано
    → Row counts match within tolerance

Итог: Legacy logic lives as testable dbt models.

Подводные камни
  • Hidden side effects in the proc (UPDATEs) — The skill flags side effects — separate them out, don't blindly convert
Сочетать с: github

Convert a slow full-refresh model to incremental

👤 Analytics engineers with long-running dbt runs ⏱ ~45 min advanced

Когда использовать: A daily model has grown too big for full refresh.

Поток
  1. Analyze the model
    Use developing-incremental-models on models/events.sql. Pick a strategy (merge / insert_overwrite / delete+insert) and justify.✓ Скопировано
    → Strategy + unique_key + partition / cluster keys recommended
  2. Implement and back-fill
    Apply the incremental config; outline a safe back-fill plan.✓ Скопировано
    → Model + back-fill steps

Итог: Daily runs that finish in minutes, not hours.

Подводные камни
  • unique_key gets duplicates on late data — Use merge and test it

Комбинации

Сочетайте с другими MCP — эффект x10

data-engineering-skill + bigquery-server

Apply the same optimize-by-id pattern to BigQuery expensive queries

Adapt finding-expensive-queries for BigQuery INFORMATION_SCHEMA.JOBS and list top 20.✓ Скопировано
data-engineering-skill + github

Open a PR per migrated model so each is reviewable

For every migrated model, open a GitHub PR with dbt test output attached.✓ Скопировано

Инструменты

Что предоставляет этот MCP

ИнструментВходные данныеКогда вызыватьСтоимость
creating-dbt-models model spec New model 0
debugging-dbt-errors error log, model CI or local run failed 0
testing-dbt-models model Untested model 0
documenting-dbt-models model Undocumented model 0
migrating-sql-to-dbt legacy SQL Legacy migration 0
refactoring-dbt-models model Hard-to-read model 0
developing-incremental-models full-refresh model Runtime too long 0
finding-expensive-queries lookback window Cost hunt ACCOUNT_USAGE query
optimizing-query-text SQL text Know the SQL, not the id 0
optimizing-query-by-id query_id Have the id from the UI 1 explain

Стоимость и лимиты

Во что обходится

Квота API
Snowflake queries cost credits like any other — ACCOUNT_USAGE reads are cheap
Токенов на вызов
5–15k per dbt skill invocation
Деньги
Free skill
Совет
Run finding-expensive-queries once weekly, not on every session

Безопасность

Права, секреты, радиус поражения

Минимальные скоупы: dbt: read + write to your project Snowflake: ACCOUNT_USAGE for cost skills
Хранение учётных данных: dbt profiles.yml / Snowflake key-pair in env; the skill doesn't store secrets
Исходящий трафик: None from the skill directly
Никогда не давайте: SYSADMIN to the Claude session unless absolutely needed

Устранение неполадок

Частые ошибки и исправления

dbt compile succeeds, run fails with column not found

Stale lineage — dbt deps + dbt clean + dbt build --select model+

finding-expensive-queries returns nothing

ACCOUNT_USAGE has ~45min delay; also confirm role has SNOWFLAKE.ACCOUNT_USAGE

Проверить: SHOW GRANTS TO ROLE <role>

Альтернативы

data-engineering-skills в сравнении

АльтернативаКогда использоватьКомпромисс
dbt Cloud IDEYou prefer managed UI over terminalNo Claude in the loop
SQL query optimizers (Select.dev, etc.)You want visual query plansSeparate tool, separate context

Ещё

Ресурсы

📖 Читать официальный README на GitHub

🐙 Открытые задачи

🔍 Все 400+ MCP-серверов и Skills