Build a private RAG over your downloaded research papers and PDFs
Quand l'utiliser : You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'
Prérequis
- PDFs or docs on disk — Any folder of files — recursive ingest supported
Déroulement
-
Ingest the folderIngest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Copié→ Per-file ingest log + 'indexed N files' summary
-
Ask questionsAcross my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Copié→ Synthesized answer with source file citations
-
Refine searchJust give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Copié→ Ranked passage list
Résultat : Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.
Pièges
- Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
- First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast