Build a private RAG over your downloaded research papers and PDFs
When to use: You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'
Prerequisites
- PDFs or docs on disk — Any folder of files — recursive ingest supported
Flow
-
Ingest the folderIngest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Copied→ Per-file ingest log + 'indexed N files' summary
-
Ask questionsAcross my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Copied→ Synthesized answer with source file citations
-
Refine searchJust give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Copied→ Ranked passage list
Outcome: Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.
Pitfalls
- Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
- First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast