Build a private RAG over your downloaded research papers and PDFs
Quando usar: You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'
Pré-requisitos
- PDFs or docs on disk — Any folder of files — recursive ingest supported
Fluxo
-
Ingest the folderIngest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Copiado→ Per-file ingest log + 'indexed N files' summary
-
Ask questionsAcross my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Copiado→ Synthesized answer with source file citations
-
Refine searchJust give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Copiado→ Ranked passage list
Resultado: Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.
Armadilhas
- Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
- First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast