Build a private RAG over your downloaded research papers and PDFs
Когда использовать: You've hoarded hundreds of PDFs in ~/Documents/papers and want to actually use them — 'what did that paper say about attention decay?'
Предварительные требования
- PDFs or docs on disk — Any folder of files — recursive ingest supported
Поток
-
Ingest the folderIngest everything under ~/Documents/papers into local-rag. Skip files larger than 50MB.✓ Скопировано→ Per-file ingest log + 'indexed N files' summary
-
Ask questionsAcross my papers, what do they say about positional encoding in long-context transformers? Cite the source file and page if possible.✓ Скопировано→ Synthesized answer with source file citations
-
Refine searchJust give me the top 5 passages most relevant to 'ring attention', raw — don't summarize.✓ Скопировано→ Ranked passage list
Итог: Every paper you've ever downloaded is now queryable by topic — permanent upgrade to your reading life.
Подводные камни
- Scanned PDFs have no extractable text — Run an OCR pass first (ocrmypdf) before ingesting
- First index of 1000+ files is slow (CPU embeddings) — Leave it running overnight; incremental re-ingest is fast