Skip to content

jaymeklein/embedbase

Repository files navigation

EmbedBase

A local-first, open-source document embedding system. Ingest documents, search them semantically, and expose results via REST API and MCP server — all without data leaving your machine.

Quickstart

# Clone inside WSL2 (not /mnt/c — see WSL2 notes below)
git clone https://github.com/your-org/embedbase
cd embedbase

# Configure
cp .env.example .env
# Edit .env: set MASTER_API_KEY to a random 32+ char string
# e.g. python -c "import secrets; print(secrets.token_urlsafe(32))"

cp config.example.yaml config.yaml  # already done if this file exists

# Start the stack (downloads ~90MB model on first run)
docker compose up --build

# UI:  http://localhost:3000
# API: http://localhost:8000
# Docs: http://localhost:8000/docs

Vector store backends

# Default — Chroma
docker compose up

# pgvector (Postgres 16)
docker compose -f docker-compose.yml -f docker-compose.postgres.yml up

# Qdrant
docker compose -f docker-compose.yml -f docker-compose.qdrant.yml up

MCP (Claude Desktop / Cursor / Zed)

EmbedBase exposes an MCP server over SSE at http://localhost:8000/mcp/sse (proxied by Nginx at /mcp/). Claude Desktop talks to a remote SSE server via mcp-remote. Add to ~/.config/claude/claude_desktop_config.json (or %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "embedbase": {
      "command": "npx",
      "args": [
        "-y", "mcp-remote",
        "http://localhost:8000/mcp/sse",
        "--header", "Authorization: Bearer ${EMBEDBASE_MASTER_KEY}"
      ],
      "env": {
        "EMBEDBASE_MASTER_KEY": "<your MASTER_API_KEY>"
      }
    }
  }
}

Authenticate with your MASTER_API_KEY. Each key is limited to 60 requests/min (configurable via mcp.rate_limit_rpm); the 61st in a minute returns 429.

Tools: list_workspaces, search_documents (query, collection_ids[], top_k, hybrid, filters), ingest_document (container-local path), list_documents, delete_document.

Document parsers (OCR, DOCX/PPTX, optional GPU)

PDFs default to the fast PyMuPDF parser (~10 ms/page) — best for text-heavy documents. For scanned PDFs or table extraction, switch to the docling backend in config.yaml:

parsers:
  pdf_backend: docling   # OCR + table structure (CPU ~200-800 ms/page)
  docling_ocr: true
  docling_tables: true

.docx and .pptx always use docling (no lightweight adapter exists), so they work as soon as the worker image carries the ML deps. docling models download lazily on first use; pre-bake them with --build-arg EMBEDBASE_DOCLING_MODELS=true.

GPU acceleration (NVIDIA RTX only) brings docling to ~30-80 ms/page:

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up

This requires the NVIDIA Container Toolkit and a CUDA-matched torch build. The default cu128 wheels in worker/Dockerfile.gpu cover every GPU from Turing/RTX 20 upward; only swap the cu1XX wheel (see pytorch.org/get-started/locally) for an older driver. The default CPU stack has zero NVIDIA dependencies.

No config needed — the GPU is auto-detected. parsers.docling_device defaults to auto: on startup the worker checks for a CUDA device and, if found, selects it and bumps the OCR/layout batch sizes (64) automatically; with no GPU it transparently falls back to CPU. Pin cpu/cuda only if you want to override detection.

Flash Attention 2 is Ampere-only (compute capability ≥ 8.0 — RTX 30/40). It is auto-enabled under auto only when both the GPU supports it and flash-attn is installed (built via --build-arg INSTALL_FLASH_ATTN=true). Turing cards (RTX 20 series, e.g. the 2060 Super at 7.5) auto-select CUDA without flash. Forcing docling_flash_attention: true on a sub-Ampere GPU fails fast with a clear error.

WSL2 notes

  • Clone inside the WSL2 filesystem (~/) — not /mnt/c/
  • Allocate at least 8 GB RAM in %UserProfile%\.wslconfig
  • Use host.docker.internal to reach services on the Windows host (e.g. Ollama)

Security checklist (shared networks)

  • Put Nginx behind a TLS reverse proxy (Caddy recommended)
  • Set a strong, random MASTER_API_KEY (min 32 chars)
  • Remove the ports mapping for api — Nginx is the only ingress
  • Set CHROMA_AUTH_TOKEN to a non-default value
  • Set EMBEDBASE_SECURE_HEADERS=true

License

Apache 2.0

About

Local-first document embedding and retrieval platform with a REST API and MCP server for LLM agents. Supports hybrid BM25 + semantic search across workspaces and collections, with pluggable parsers, embedders, and vector stores.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors