EmbedBase

A local-first, open-source document embedding system. Ingest documents, search them semantically, and expose results via REST API and MCP server — all without data leaving your machine.

Quickstart

# Clone inside WSL2 (not /mnt/c — see WSL2 notes below)
git clone https://github.com/your-org/embedbase
cd embedbase

# Configure
cp .env.example .env
# Edit .env: set MASTER_API_KEY to a random 32+ char string
# e.g. python -c "import secrets; print(secrets.token_urlsafe(32))"

cp config.example.yaml config.yaml  # already done if this file exists

# Start the stack (downloads ~90MB model on first run)
docker compose up --build

# UI:  http://localhost:3000
# API: http://localhost:8000
# Docs: http://localhost:8000/docs

Vector store backends

# Default — Chroma
docker compose up

# pgvector (Postgres 16)
docker compose -f docker-compose.yml -f docker-compose.postgres.yml up

# Qdrant
docker compose -f docker-compose.yml -f docker-compose.qdrant.yml up

MCP (Claude Desktop / Cursor / Zed)

EmbedBase exposes an MCP server over SSE at http://localhost:8000/mcp/sse (proxied by Nginx at /mcp/). Claude Desktop talks to a remote SSE server via mcp-remote. Add to ~/.config/claude/claude_desktop_config.json (or %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "embedbase": {
      "command": "npx",
      "args": [
        "-y", "mcp-remote",
        "http://localhost:8000/mcp/sse",
        "--header", "Authorization: Bearer ${EMBEDBASE_MASTER_KEY}"
      ],
      "env": {
        "EMBEDBASE_MASTER_KEY": "<your MASTER_API_KEY>"
      }
    }
  }
}

Authenticate with your MASTER_API_KEY. Each key is limited to 60 requests/min (configurable via mcp.rate_limit_rpm); the 61st in a minute returns 429.

Tools: list_workspaces, search_documents (query, collection_ids[], top_k, hybrid, filters), ingest_document (container-local path), list_documents, delete_document.

Document parsers (OCR, DOCX/PPTX, optional GPU)

PDFs default to the fast PyMuPDF parser (~10 ms/page) — best for text-heavy documents. For scanned PDFs or table extraction, switch to the docling backend in config.yaml:

parsers:
  pdf_backend: docling   # OCR + table structure (CPU ~200-800 ms/page)
  docling_ocr: true
  docling_tables: true

.docx and .pptx always use docling (no lightweight adapter exists), so they work as soon as the worker image carries the ML deps. docling models download lazily on first use; pre-bake them with --build-arg EMBEDBASE_DOCLING_MODELS=true.

GPU acceleration (NVIDIA RTX only) brings docling to ~30-80 ms/page:

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up

This requires the NVIDIA Container Toolkit and a CUDA-matched torch build. The default cu128 wheels in worker/Dockerfile.gpu cover every GPU from Turing/RTX 20 upward; only swap the cu1XX wheel (see pytorch.org/get-started/locally) for an older driver. The default CPU stack has zero NVIDIA dependencies.

No config needed — the GPU is auto-detected. parsers.docling_device defaults to auto: on startup the worker checks for a CUDA device and, if found, selects it and bumps the OCR/layout batch sizes (64) automatically; with no GPU it transparently falls back to CPU. Pin cpu/cuda only if you want to override detection.

Flash Attention 2 is Ampere-only (compute capability ≥ 8.0 — RTX 30/40). It is auto-enabled under auto only when both the GPU supports it and flash-attn is installed (built via --build-arg INSTALL_FLASH_ATTN=true). Turing cards (RTX 20 series, e.g. the 2060 Super at 7.5) auto-select CUDA without flash. Forcing docling_flash_attention: true on a sub-Ampere GPU fails fast with a clear error.

WSL2 notes

Clone inside the WSL2 filesystem (~/) — not /mnt/c/
Allocate at least 8 GB RAM in %UserProfile%\.wslconfig
Use host.docker.internal to reach services on the Windows host (e.g. Ollama)

Security checklist (shared networks)

Put Nginx behind a TLS reverse proxy (Caddy recommended)
Set a strong, random MASTER_API_KEY (min 32 chars)
Remove the ports mapping for api — Nginx is the only ingress
Set CHROMA_AUTH_TOKEN to a non-default value
Set EMBEDBASE_SECURE_HEADERS=true

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github/workflows		.github/workflows
api		api
docs		docs
scripts		scripts
tests		tests
ui		ui
worker		worker
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.postgres.yml		docker-compose.postgres.yml
docker-compose.qdrant.yml		docker-compose.qdrant.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmbedBase

Quickstart

Vector store backends

MCP (Claude Desktop / Cursor / Zed)

Document parsers (OCR, DOCX/PPTX, optional GPU)

WSL2 notes

Security checklist (shared networks)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EmbedBase

Quickstart

Vector store backends

MCP (Claude Desktop / Cursor / Zed)

Document parsers (OCR, DOCX/PPTX, optional GPU)

WSL2 notes

Security checklist (shared networks)

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages