data-engineering-skills

왜 쓰나요

핵심 기능

7 dbt skills — create, debug, test, document, migrate, refactor, incremental
2 Snowflake skills — find expensive queries, optimize by text/id
Skills auto-activate based on task context
Benchmarked: 53% real-world dbt pass rate, 84% Snowflake optimization pass rate
Includes dependency tracking and output verification steps

라이브 데모

실제 사용 모습

data-engineering-skill.replay ▶ 준비됨

0/0

설치

클라이언트 선택

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop → Settings → Developer → Edit Config 열기. 저장 후 앱 재시작.

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Cursor는 Claude Desktop과 동일한 mcpServers 스키마 사용. 프로젝트 설정이 전역보다 우선.

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Cline 사이드바의 MCP Servers 아이콘 클릭 후 "Edit Configuration" 선택.

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "data-engineering-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop과 같은 형식. Windsurf 재시작 후 적용.

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "data-engineering-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/AltimateAI/data-engineering-skills",
        "~/.claude/skills/data-engineering-skills"
      ]
    }
  ]
}

Continue는 맵이 아닌 서버 오브젝트 배열 사용.

~/.config/zed/settings.json

{
  "context_servers": {
    "data-engineering-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/AltimateAI/data-engineering-skills",
          "~/.claude/skills/data-engineering-skills"
        ]
      }
    }
  }
}

context_servers에 추가. 저장 시 Zed가 핫 리로드.

claude mcp add data-engineering-skill -- git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

한 줄 명령. claude mcp list로 확인, claude mcp remove로 제거.

사용 사례

실전 활용법: data-engineering-skills

Debug a failing dbt model without thrashing

👤 Analytics engineers facing a red CI run ⏱ ~20 min intermediate

언제 쓸까: dbt run just failed with a cryptic error and you don't know if it's schema, lineage, or SQL.

사전 조건

dbt project accessible — cd into your dbt repo so Claude can see models/
Skill installed — git clone https://github.com/AltimateAI/data-engineering-skills ~/.claude/skills/data-engineering-skills

흐름

Feed Claude the error + model

Use debugging-dbt-errors. Here's the stderr and models/marts/fct_orders.sql. Diagnose the root cause — don't guess.✓ 복사됨

→ Claude reads upstream refs, diagnoses in order: schema → lineage → SQL
Apply the fix and verify

Apply the fix and run dbt build --select fct_orders+. Show me the before/after row counts.✓ 복사됨

→ Clean run + row count verification

결과: Green CI plus a note of the root cause so it doesn't recur.

함정

Fixing a symptom downstream when the bug is upstream — The skill enforces an upstream-first diagnosis; don't skip the lineage step

함께 쓰기: bigquery-server · github

Find and fix your top expensive Snowflake queries

👤 Analytics leads with a climbing Snowflake bill ⏱ ~60 min intermediate

언제 쓸까: Finance flagged the Snowflake bill and you need to cut it without breaking dashboards.

사전 조건

Snowflake role with ACCOUNT_USAGE access — ACCOUNTADMIN typically, or a dedicated cost role

흐름

Identify worst offenders

Use finding-expensive-queries to list the top 20 queries in the past 30 days by credit cost. Group by app/user.✓ 복사됨

→ Ranked table with credits, runtime, warehouse
Optimize each top one

For the top offender, use optimizing-query-by-id <query_id>. Propose rewrites with estimated savings.✓ 복사됨

→ Rewritten SQL + before/after explain plan
Validate and deploy

Run the rewrite in a test warehouse — confirm same row count and shape before we swap.✓ 복사됨

→ Safe swap candidate

결과: A prioritized list of fixes with measurable $ savings.

함정

Rewrites change row count silently — Always diff before deploying — the skill enforces this

함께 쓰기: bigquery-server

Migrate a pile of stored procs into dbt models

👤 Teams moving off legacy SQL to dbt ⏱ ~90 min advanced

언제 쓸까: You've inherited a warehouse of nested CTEs and want them as documented, tested dbt models.

흐름

Point the skill at the source SQL

Use migrating-sql-to-dbt. Here's proc_monthly_revenue.sql. Convert it to dbt models with refs, documentation, and at least 2 tests per model.✓ 복사됨

→ One or more .sql files, schema.yml with docs and tests
Build and verify

dbt build the new models and compare row counts to the legacy output.✓ 복사됨

→ Row counts match within tolerance

결과: Legacy logic lives as testable dbt models.

함정

Hidden side effects in the proc (UPDATEs) — The skill flags side effects — separate them out, don't blindly convert

함께 쓰기: github

Convert a slow full-refresh model to incremental

👤 Analytics engineers with long-running dbt runs ⏱ ~45 min advanced

언제 쓸까: A daily model has grown too big for full refresh.

흐름

Analyze the model

Use developing-incremental-models on models/events.sql. Pick a strategy (merge / insert_overwrite / delete+insert) and justify.✓ 복사됨

→ Strategy + unique_key + partition / cluster keys recommended
Implement and back-fill

Apply the incremental config; outline a safe back-fill plan.✓ 복사됨

→ Model + back-fill steps

결과: Daily runs that finish in minutes, not hours.

함정

unique_key gets duplicates on late data — Use merge and test it

조합

다른 MCP와 조합해 10배 효율

data-engineering-skill + bigquery-server

Apply the same optimize-by-id pattern to BigQuery expensive queries

Adapt finding-expensive-queries for BigQuery INFORMATION_SCHEMA.JOBS and list top 20.✓ 복사됨

data-engineering-skill + github

Open a PR per migrated model so each is reviewable

For every migrated model, open a GitHub PR with dbt test output attached.✓ 복사됨

도구

이 MCP가 노출하는 것

도구	입력	언제 호출	비용
creating-dbt-models	model spec	New model	0
debugging-dbt-errors	error log, model	CI or local run failed	0
testing-dbt-models	model	Untested model	0
documenting-dbt-models	model	Undocumented model	0
migrating-sql-to-dbt	legacy SQL	Legacy migration	0
refactoring-dbt-models	model	Hard-to-read model	0
developing-incremental-models	full-refresh model	Runtime too long	0
finding-expensive-queries	lookback window	Cost hunt	ACCOUNT_USAGE query
optimizing-query-text	SQL text	Know the SQL, not the id	0
optimizing-query-by-id	query_id	Have the id from the UI	1 explain

비용 및 제한

운영 비용

API 쿼터: Snowflake queries cost credits like any other — ACCOUNT_USAGE reads are cheap
호출당 토큰: 5–15k per dbt skill invocation
금액: Free skill
팁: Run finding-expensive-queries once weekly, not on every session

보안

권한, 시크릿, 파급범위

최소 스코프: dbt: read + write to your project Snowflake: ACCOUNT_USAGE for cost skills

자격 증명 저장: dbt profiles.yml / Snowflake key-pair in env; the skill doesn't store secrets

데이터 외부 송신: None from the skill directly

절대 부여 금지: SYSADMIN to the Claude session unless absolutely needed

문제 해결

자주 발생하는 오류와 해결

dbt compile succeeds, run fails with column not found

Stale lineage — dbt deps + dbt clean + dbt build --select model+

finding-expensive-queries returns nothing

ACCOUNT_USAGE has ~45min delay; also confirm role has SNOWFLAKE.ACCOUNT_USAGE

확인: SHOW GRANTS TO ROLE <role>

대안