/ Каталог / Песочница / Jina AI
● Официальный jina-ai 🔑 Нужен свой ключ

Jina AI

автор jina-ai · jina-ai/MCP

19 tools for web reading, search (web, arXiv, SSRN, images), reranking, classification, and PDF extraction — Jina's AI infra as MCP.

Jina AI's official MCP exposes their Reader, Search, and processing APIs. Use it for clean markdown extraction from any URL, academic search across arXiv and SSRN, image/text deduplication, reranking, and PDF figure/table extraction. Free tier usable; API key unlocks higher rate limits.

Зачем использовать

Ключевые функции

Живое демо

Как выглядит на практике

jina.replay ▶ готово
0/0

Установка

Выберите клиент

~/Library/Application Support/Claude/claude_desktop_config.json  · Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "jina": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.jina.ai/sse"
      ]
    }
  }
}

Откройте Claude Desktop → Settings → Developer → Edit Config. Перезапустите после сохранения.

~/.cursor/mcp.json · .cursor/mcp.json
{
  "mcpServers": {
    "jina": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.jina.ai/sse"
      ]
    }
  }
}

Cursor использует ту же схему mcpServers, что и Claude Desktop. Конфиг проекта приоритетнее глобального.

VS Code → Cline → MCP Servers → Edit
{
  "mcpServers": {
    "jina": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.jina.ai/sse"
      ]
    }
  }
}

Щёлкните значок MCP Servers на боковой панели Cline, затем "Edit Configuration".

~/.codeium/windsurf/mcp_config.json
{
  "mcpServers": {
    "jina": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.jina.ai/sse"
      ]
    }
  }
}

Тот же формат, что и Claude Desktop. Перезапустите Windsurf для применения.

~/.continue/config.json
{
  "mcpServers": [
    {
      "name": "jina",
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mcp.jina.ai/sse"
      ]
    }
  ]
}

Continue использует массив объектов серверов, а не map.

~/.config/zed/settings.json
{
  "context_servers": {
    "jina": {
      "command": {
        "path": "npx",
        "args": [
          "-y",
          "mcp-remote",
          "https://mcp.jina.ai/sse"
        ]
      }
    }
  }
}

Добавьте в context_servers. Zed перезагружается автоматически.

claude mcp add jina -- npx -y mcp-remote https://mcp.jina.ai/sse

Однострочная команда. Проверить: claude mcp list. Удалить: claude mcp remove.

Сценарии использования

Реальные сценарии: Jina AI

Digest recent arXiv papers on a topic

👤 Researchers, ML engineers staying current ⏱ ~20 min intermediate

Когда использовать: You want to know what's new on arXiv about your topic without reading 50 abstracts.

Предварительные требования
  • Optional Jina API key — jina.ai → dashboard → API key (free tier works for light use)
Поток
  1. Search arXiv
    Use search_arxiv to find papers from the last 30 days about 'speculative decoding for LLM inference'. Return top 20.✓ Скопировано
    → Paper list with titles, authors, abstracts
  2. Rerank by relevance
    Use sort_by_relevance to rerank against this query: 'practical speedups in production inference, not pure research'. Keep top 8.✓ Скопировано
    → Reranked list
  3. Summarize each
    For the top 8, extract_pdf the paper, summarize in 3 bullets: contribution, method, reported speedup. Output as a markdown table.✓ Скопировано
    → Digest-ready summary table

Итог: A weekly research digest on your topic in 10 minutes.

Подводные камни
  • extract_pdf on every result is expensive — credits add up — Rerank first to cut candidates, only extract the top N
Сочетать с: notion

Convert a batch of URLs into clean markdown for RAG

👤 AI engineers building retrieval systems ⏱ ~15 min intermediate

Когда использовать: You have a list of URLs to ingest. You want clean markdown, not raw HTML or a parsing pipeline.

Поток
  1. Read URLs in parallel
    Use parallel_read_url on this list [URLs]. Return markdown for each with original URL as key.✓ Скопировано
    → Markdown per URL
  2. Dedupe near-duplicates
    Use deduplicate_strings at 0.9 similarity to drop near-duplicate pages (common for mirror docs).✓ Скопировано
    → Deduped set with IDs of dropped pages
  3. Save to disk
    Save each to ./knowledge/<slug>.md where slug is derived from the URL path.✓ Скопировано
    → Markdown files ready for embedding pipeline

Итог: A clean corpus for your embedding/indexing step, without writing any scraping code.

Подводные камни
  • Paywalled or JS-auth-walled pages return blank/garbage — Spot check a few URLs — if the content is thin, fall back to playwright for auth flows
Сочетать с: filesystem · firecrawl

Classify a batch of text with custom labels

👤 Data analysts, growth teams ⏱ ~15 min beginner

Когда использовать: You have N free-text items (tickets, reviews, survey responses) and want them bucketed into your taxonomy.

Поток
  1. Define labels
    My labels: ['bug', 'feature_request', 'question', 'praise', 'other']. Sample the first 10 items and sanity-check the labels fit.✓ Скопировано
    → Labels validated against samples
  2. Batch classify
    Use classify_text on all items with those labels. Return {id, text, label, confidence}.✓ Скопировано
    → Labelled dataset
  3. Review low-confidence
    Flag items where confidence < 0.6 for manual review. Summarize: distribution, outliers, likely missing labels.✓ Скопировано
    → Review queue + taxonomy feedback

Итог: A labeled dataset without fine-tuning a classifier or writing prompts per item.

Подводные камни
  • Labels are ambiguous and classifier flip-flops on near-ties — Make labels mutually exclusive; if items span categories, allow multi-label output
Сочетать с: filesystem

Комбинации

Сочетайте с другими MCP — эффект x10

jina + notion

Weekly research digest posted to Notion

Search arXiv for new 'agentic RAG' papers this week. Summarize each and create a Notion page in the Research Digest database.✓ Скопировано
jina + firecrawl

Jina for single pages, Firecrawl for full crawls — same clean-markdown output

For the list of URLs, use parallel_read_url (Jina). For the 3 full docs sites, use Firecrawl crawl. Merge into one knowledge dir.✓ Скопировано
jina + filesystem

Build a local markdown knowledge base from a reading list

Read each URL in urls.txt, dedupe, save to ./knowledge/<hash>.md. Overwrite only if content changed.✓ Скопировано

Инструменты

Что предоставляет этот MCP

ИнструментВходные данныеКогда вызыватьСтоимость
search_web query, num_results? General web search credits per call
search_arxiv / search_ssrn / search_bibtex / search_images / search_jina_blog query Targeted searches credits per call
parallel_search_web / parallel_search_arxiv / parallel_search_ssrn query[] Multi-query research in one call credits × N queries
read_url url Clean content extraction from any URL credits per page
parallel_read_url url[] Batch webpage ingestion credits × N pages
capture_screenshot_url url Visual snapshot of a page credits
sort_by_relevance documents, query Rerank after search for quality credits
classify_text texts, labels Zero-shot classification credits per text
deduplicate_strings / deduplicate_images items, threshold Remove near-duplicates from a corpus credits
extract_pdf url or file Get structured content from PDFs credits per PDF
expand_query / primer / guess_datetime_url utility Helpers around search tuning credits (minor)

Стоимость и лимиты

Во что обходится

Квота API
Free tier available with rate limits; paid tiers scale
Токенов на вызов
Output is the bigger cost — PDFs and dedupes can return 10k+ tokens
Деньги
Jina API credits, typically measured per-request. See jina.ai/pricing.
Совет
Rerank before extracting — extract_pdf is expensive. Cache read_url outputs locally; most pages don't change daily.

Безопасность

Права, секреты, радиус поражения

Хранение учётных данных: JINA_API_KEY env var (optional for many tools, required for heavy use)
Исходящий трафик: All calls to api.jina.ai / r.jina.ai / s.jina.ai — queries and URLs visible to Jina

Устранение неполадок

Частые ошибки и исправления

429 Too Many Requests

Free tier has low rate limits. Add a JINA_API_KEY env var and upgrade at jina.ai for burst capacity.

read_url returns empty markdown

Page may be auth-walled or bot-blocked. Try with different User-Agent via tool options, or fall back to playwright/firecrawl.

classify_text assigns everything to 'other'

Your labels may be too narrow or too similar. Add label descriptions ('bug: user reports something broken') for better zero-shot accuracy.

search_arxiv misses recent papers

arXiv index may lag; cross-check with a direct arxiv.org search. Use expand_query to broaden terms.

Альтернативы

Jina AI в сравнении

АльтернативаКогда использоватьКомпромисс
FirecrawlYou need full-site crawls or JSON-schema extractionCrawl-focused; Jina's superpower is the breadth of processing tools beyond just reading
Exa Search MCPYou want semantic/neural web search as a primary interfaceStronger on semantic retrieval; narrower than Jina's toolbox
Brave Search MCPYou want independent search index + privacySearch only, no reader/rerank/classify

Ещё

Ресурсы

📖 Читать официальный README на GitHub

🐙 Открытые задачи

🔍 Все 400+ MCP-серверов и Skills