Build a private RAG over your downloaded research papers and PDFs
Wann einsetzen: You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'
Voraussetzungen
- PDFs or docs on disk — Any folder of files — recursive ingest supported
Ablauf
-
Ingest the folderIngest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Kopiert→ Per-file ingest log + 'indexed N files' summary
-
Ask questionsAcross my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Kopiert→ Synthesized answer with source file citations
-
Refine searchJust give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Kopiert→ Ranked passage list
Ergebnis: Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.
Fallstricke
- Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
- First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast