mcp-local-rag — Instalar & Demo ao vivo

Por que usar

Principais recursos

Fully local — embeddings computed with on-device model, zero network after first run
Hybrid search: semantic (embeddings) boosted with keyword matching
Supports PDF, DOCX, TXT, Markdown, HTML — common personal-knowledge formats
Single command setup via npx mcp-local-rag; also usable as a CLI
CPU-only — works on any modern laptop

Demo ao vivo

Como fica na prática

local-rag.replay ▶ pronto

0/0

Instalar

Escolha seu cliente

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "local-rag": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-local-rag"
      ],
      "_inferred": true
    }
  }
}

Abra Claude Desktop → Settings → Developer → Edit Config. Reinicie após salvar.

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "local-rag": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-local-rag"
      ],
      "_inferred": true
    }
  }
}

Cursor usa o mesmo esquema mcpServers que o Claude Desktop. Config de projeto vence a global.

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "local-rag": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-local-rag"
      ],
      "_inferred": true
    }
  }
}

Clique no ícone MCP Servers na barra lateral do Cline, depois "Edit Configuration".

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "local-rag": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-local-rag"
      ],
      "_inferred": true
    }
  }
}

Mesmo formato do Claude Desktop. Reinicie o Windsurf para aplicar.

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "local-rag",
      "command": "npx",
      "args": [
        "-y",
        "mcp-local-rag"
      ]
    }
  ]
}

O Continue usa um array de objetos de servidor em vez de um map.

~/.config/zed/settings.json

{
  "context_servers": {
    "local-rag": {
      "command": {
        "path": "npx",
        "args": [
          "-y",
          "mcp-local-rag"
        ]
      }
    }
  }
}

Adicione em context_servers. Zed recarrega automaticamente ao salvar.

claude mcp add local-rag -- npx -y mcp-local-rag

Uma linha só. Verifique com claude mcp list. Remova com claude mcp remove.

Casos de uso

Usos do mundo real: mcp-local-rag

Build a private RAG over your downloaded research papers and PDFs

👤 Researchers, students, knowledge workers ⏱ ~30 min beginner

Quando usar: You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'

Pré-requisitos

PDFs or docs on disk — Any folder of files — recursive ingest supported

Fluxo

Ingest the folder

Ingest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Copiado

→ Per-file ingest log + 'indexed N files' summary
Ask questions

Across my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Copiado

→ Synthesized answer with source file citations
Refine search

Just give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Copiado

→ Ranked passage list

Resultado: Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.

Armadilhas

Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast

Combine com: filesystem

Query confidential contracts / HR docs without leaking to any cloud

👤 Legal ops, HR, compliance ⏱ ~20 min intermediate

Quando usar: Documents are too sensitive for OpenAI/Claude cloud embeddings. You need search but can't send content anywhere.

Fluxo

Ingest

Ingest /secure/contracts/*.pdf into local-rag.✓ Copiado

→ Files indexed locally; confirm no network call was made
Query

Which contracts have an auto-renewal clause longer than 12 months?✓ Copiado

→ List of candidate contracts with the clause quoted

Resultado: Searchable private corpus with nothing leaving the machine.

Armadilhas

Claude answers still go to Anthropic — the embeddings are local but conversation isn't — If answers must also be local, run with a local LLM via Ollama or LM Studio instead of cloud Claude

Combine com: filesystem

Index your codebase docs and Markdown for instant 'where is this explained?' lookups

👤 Engineers on large repos ⏱ ~15 min beginner

Quando usar: Your team wiki / design docs / ADRs are scattered across Markdown files and grep doesn't find things by meaning.

Fluxo

Ingest the docs tree

Ingest all .md files under ./docs. Re-ingest when I ask.✓ Copiado

→ Indexed file list
Ask architecture questions

Where's the rationale for using event sourcing instead of CRUD? I remember an ADR about it.✓ Copiado

→ Pointer to the specific ADR file + quoted rationale

Resultado: Team knowledge you can search by concept, not filename.

Armadilhas

Docs drift — old ADRs contradict current state — Re-ingest before important searches; delete_file for deprecated docs

Combine com: filesystem

Combinações

Combine com outros MCPs para 10× de alavancagem

local-rag + filesystem

Watch a folder, re-ingest files when they change

Every time a file under ~/Notes changes, re-ingest it into local-rag.✓ Copiado

local-rag + firecrawl

Scrape a docs site then feed to local-rag for offline querying

Crawl docs.example.com, save each page as Markdown, then ingest all of them into local-rag.✓ Copiado

local-rag + playwright

Capture JS-rendered pages and ingest their extracted text

Open this SPA, grab the rendered HTML, ingest_data it into local-rag with the URL as source.✓ Copiado

Ferramentas

O que este MCP expõe

Ferramenta	Entradas	Quando chamar	Custo
ingest_file	path: str \| path[]	Add one or more files to the index	CPU only
ingest_data	html: str, source_url?: str	Add a raw HTML blob — useful after scraping	CPU only
query_documents	query: str, top_k?: int	Main retrieval call — use before answering user questions	free
list_files		See what's indexed	free
delete_file	path: str	Remove a stale/irrelevant file from the index	free
status		Sanity check index size	free

Custo e limites

O que custa rodar

Cota de API: None — all local
Tokens por chamada: Query results 500-3000 tokens depending on top_k
Monetário: Free. One-time ~90MB model download.
Dica: Set top_k to 5-8 for most questions; going higher wastes tokens without improving answers.

Segurança

Permissões, segredos, alcance

Armazenamento de credenciais: None — no API keys

Saída de dados: Zero after model download. Your docs never leave the machine.

Single-user — no multi-user isolation. Don't share the index directory.
Changing embedding model requires wiping the DB and re-ingesting everything.

Solução de problemas

Erros comuns e correções

First query is slow / seems to hang

Embedding model is downloading on first run (~90MB). Subsequent calls are fast.

Verificar: Check ~/.cache/mcp-local-rag for the model file

PDF ingest returns 0 chunks

PDF is likely scanned (image-only). Run ocrmypdf input.pdf output.pdf first.

Verificar: pdftotext input.pdf -

Results feel irrelevant

Pure semantic search struggles with short queries. Add more keywords. The hybrid search boosts them already.

Out of memory on large PDFs

Split the PDF first, or raise Node heap: NODE_OPTIONS=--max-old-space-size=8192

Alternativas

mcp-local-rag vs. outros

Alternativa	Quando usar	Troca
Chroma MCP / Qdrant MCP	You want a real vector DB with multi-user, scaling, metadata filters	More setup, usually requires a running server
OpenAI Assistants file_search	You're OK sending documents to OpenAI's cloud	Not local, costs per token, but zero setup and more accurate
ChatGPT Projects / Claude Projects file upload	Small document set (<20 files) and you use the hosted chat	Not an MCP; can't be scripted

Mais

Recursos

📖 Leia o README oficial no GitHub

🐙 Ver issues abertas

🔍 Ver todos os 400+ servidores MCP e Skills