data-engineering-skills — Installieren & Live-Demo

Warum nutzen

Hauptfunktionen

7 dbt skills — create, debug, test, document, migrate, refactor, incremental
2 Snowflake skills — find expensive queries, optimize by text/id
Skills auto-activate based on task context
Benchmarked: 53% real-world dbt pass rate, 84% Snowflake optimization pass rate
Includes dependency tracking and output verification steps

Live-Demo

In der Praxis

data-engineering-skill.replay ▶ bereit

0/0

Installieren

Wählen Sie Ihren Client

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Öffne Claude Desktop → Settings → Developer → Edit Config. Nach dem Speichern neu starten.

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Cursor nutzt das gleiche mcpServers-Schema wie Claude Desktop. Projektkonfiguration schlägt die globale.

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Klicken Sie auf das MCP-Servers-Symbol in der Cline-Seitenleiste, dann "Edit Configuration".

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Gleiche Struktur wie Claude Desktop. Windsurf neu starten zum Übernehmen.

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "data-engineering-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ]
    }
  ]
}

Continue nutzt ein Array von Serverobjekten statt einer Map.

~/.config/zed/settings.json

{
  "context_servers": {
    "data-engineering-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/AltimateAI/data-engineering-skills",
          "~/.claude/skills/data-engineering-skills"
        ]
      }
    }
  }
}

In context_servers hinzufügen. Zed lädt beim Speichern neu.

claude mcp add data-engineering-skill -- git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

Einzeiler. Prüfen mit claude mcp list. Entfernen mit claude mcp remove.

Anwendungsfälle

Praxisnahe Nutzung: data-engineering-skills

Debug a failing dbt model without thrashing

👤 Analytics engineers facing a red CI run ⏱ ~20 min intermediate

Wann einsetzen: dbt run just failed with a cryptic error and you don't know if it's schema, lineage, or SQL.

Voraussetzungen

dbt project accessible — cd into your dbt repo so Claude can see models/
Skill installed — git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

Ablauf

Feed Claude the error + model

Use debugging-dbt-errors. Here's the stderr and models/marts/fct_orders.sql. Diagnose the root cause — don't guess.✓ Kopiert

→ Claude reads upstream refs, diagnoses in order: schema → lineage → SQL
Apply the fix and verify

Apply the fix and run dbt build --select fct_orders+. Show me the before/after row counts.✓ Kopiert

→ Clean run + row count verification

Ergebnis: Green CI plus a note of the root cause so it doesn't recur.

Fallstricke

Fixing a symptom downstream when the bug is upstream — The skill enforces an upstream-first diagnosis; don't skip the lineage step

Kombinieren mit: bigquery-server · github

Find and fix your top expensive Snowflake queries

👤 Analytics leads with a climbing Snowflake bill ⏱ ~60 min intermediate

Wann einsetzen: Finance flagged the Snowflake bill and you need to cut it without breaking dashboards.

Voraussetzungen

Snowflake role with ACCOUNT_USAGE access — ACCOUNTADMIN typically, or a dedicated cost role

Ablauf

Identify worst offenders

Use finding-expensive-queries to list the top 20 queries in the past 30 days by credit cost. Group by app/user.✓ Kopiert

→ Ranked table with credits, runtime, warehouse
Optimize each top one

For the top offender, use optimizing-query-by-id <query_id>. Propose rewrites with estimated savings.✓ Kopiert

→ Rewritten SQL + before/after explain plan
Validate and deploy

Run the rewrite in a test warehouse — confirm same row count and shape before we swap.✓ Kopiert

→ Safe swap candidate

Ergebnis: A prioritized list of fixes with measurable $ savings.

Fallstricke

Rewrites change row count silently — Always diff before deploying — the skill enforces this

Kombinieren mit: bigquery-server

Migrate a pile of stored procs into dbt models

👤 Teams moving off legacy SQL to dbt ⏱ ~90 min advanced

Wann einsetzen: You've inherited a warehouse of nested CTEs and want them as documented, tested dbt models.

Ablauf

Point the skill at the source SQL

Use migrating-sql-to-dbt. Here's proc_monthly_revenue.sql. Convert it to dbt models with refs, documentation, and at least 2 tests per model.✓ Kopiert

→ One or more .sql files, schema.yml with docs and tests
Build and verify

dbt build the new models and compare row counts to the legacy output.✓ Kopiert

→ Row counts match within tolerance

Ergebnis: Legacy logic lives as testable dbt models.

Fallstricke

Hidden side effects in the proc (UPDATEs) — The skill flags side effects — separate them out, don't blindly convert

Kombinieren mit: github

Convert a slow full-refresh model to incremental

👤 Analytics engineers with long-running dbt runs ⏱ ~45 min advanced

Wann einsetzen: A daily model has grown too big for full refresh.

Ablauf

Analyze the model

Use developing-incremental-models on models/events.sql. Pick a strategy (merge / insert_overwrite / delete+insert) and justify.✓ Kopiert

→ Strategy + unique_key + partition / cluster keys recommended
Implement and back-fill

Apply the incremental config; outline a safe back-fill plan.✓ Kopiert

→ Model + back-fill steps

Ergebnis: Daily runs that finish in minutes, not hours.

Fallstricke

unique_key gets duplicates on late data — Use merge and test it

Kombinationen

Mit anderen MCPs für 10-fache Wirkung

data-engineering-skill + bigquery-server

Apply the same optimize-by-id pattern to BigQuery expensive queries

Adapt finding-expensive-queries for BigQuery INFORMATION_SCHEMA.JOBS and list top 20.✓ Kopiert

data-engineering-skill + github

Open a PR per migrated model so each is reviewable

For every migrated model, open a GitHub PR with dbt test output attached.✓ Kopiert

Werkzeuge

Was dieses MCP bereitstellt

Werkzeug	Eingaben	Wann aufrufen	Kosten
creating-dbt-models	model spec	New model	0
debugging-dbt-errors	error log, model	CI or local run failed	0
testing-dbt-models	model	Untested model	0
documenting-dbt-models	model	Undocumented model	0
migrating-sql-to-dbt	legacy SQL	Legacy migration	0
refactoring-dbt-models	model	Hard-to-read model	0
developing-incremental-models	full-refresh model	Runtime too long	0
finding-expensive-queries	lookback window	Cost hunt	ACCOUNT_USAGE query
optimizing-query-text	SQL text	Know the SQL, not the id	0
optimizing-query-by-id	query_id	Have the id from the UI	1 explain

Kosten & Limits

Was der Betrieb kostet

API-Kontingent: Snowflake queries cost credits like any other — ACCOUNT_USAGE reads are cheap
Tokens pro Aufruf: 5–15k per dbt skill invocation
Kosten in €: Free skill
Tipp: Run finding-expensive-queries once weekly, not on every session

Sicherheit

Rechte, Secrets, Reichweite

Minimale Scopes: dbt: read + write to your project Snowflake: ACCOUNT_USAGE for cost skills

Credential-Speicherung: dbt profiles.yml / Snowflake key-pair in env; the skill doesn't store secrets

Datenabfluss: None from the skill directly

Niemals gewähren: SYSADMIN to the Claude session unless absolutely needed

Fehlerbehebung

Häufige Fehler und Lösungen

dbt compile succeeds, run fails with column not found

Stale lineage — dbt deps + dbt clean + dbt build --select model+

finding-expensive-queries returns nothing

ACCOUNT_USAGE has ~45min delay; also confirm role has SNOWFLAKE.ACCOUNT_USAGE

Prüfen: SHOW GRANTS TO ROLE <role>

Alternativen

data-engineering-skills vs. andere

Alternative	Wann stattdessen	Kompromiss
dbt Cloud IDE	You prefer managed UI over terminal	No Claude in the loop
SQL query optimizers (Select.dev, etc.)	You want visual query plans	Separate tool, separate context

Mehr

Ressourcen

📖 Offizielle README auf GitHub lesen

🐙 Offene Issues ansehen

🔍 Alle 400+ MCP-Server und Skills durchsuchen