Analyze a CSV/JSON dump by loading it into SQLite
When to use: Someone sent you a CSV with 200k rows and the question 'which segment converts best?' — too big for spreadsheets, too small for a real DB.
Prerequisites
- Source file on disk — Save as .csv or .json under a working folder
- An empty SQLite file path — Choose a location like /tmp/analysis.db; the MCP will create it
Flow
-
Create the table and loadCreate a table
signupsin /tmp/analysis.db matching the columns of /data/signups.csv. Load all rows. Tell me the row count.✓ Copied→ Table created, row count matches file -
Explore the schemaWhat columns exist? For each, what's the value distribution (top 5 distinct values for categoricals; min/max/avg for numerics)?✓ Copied→ Per-column profile
-
Answer the actual questionGroup by signup_source. For each, compute: total signups, conversion rate (signups with completed_onboarding=true / total). Sort by conversion rate.✓ Copied→ Decision-grade table with the SQL shown
Outcome: Defensible answers in 5 minutes, with a .db file you can re-query as new questions come up.
Pitfalls
- CSV columns auto-typed wrong (numbers as TEXT) — After load, run
PRAGMA table_info(signups)and CAST or recreate columns with explicit types if needed - Date strings don't sort/compare correctly as TEXT — Store dates as ISO 8601 (YYYY-MM-DDTHH:MM:SSZ) so lexicographic = chronological; or use
julianday()for math