A software-agnostic DJ music library manager. Maintains its own SQLite database as the source of truth for track metadata, syncs to DJ software (Mixxx first; Rekordbox and Serato as future adapters).
Designed for agent-operated workflows: JSON output, dry-run defaults, per-track error isolation, and automatic backups before every write. Eventually exposed as an MCP server so AI agents can drive your full library pipeline natively.
uv sync
multidj --helpOptional audio analysis (BPM, key, energy detection):
uv sync --extra analysisOptional semantic embeddings, clustering, similarity search, and DJ suggestions:
uv sync --extra embeddingsThe embeddings extra adds CLAP audio encoding (laion/larger_clap_music, 512-dim), UMAP+HDBSCAN clustering, and the multidj similar / multidj suggest commands. Model weights (~1.5 GB) are downloaded on first use.
Optional CLaMP3 backend (for future text→audio agent vibe search):
uv sync --extra clamp3
git submodule update --init vendor/clamp3Optional metadata enrichment (Discogs + MusicBrainz):
uv sync --extra enrichMultiDJ keeps its own database separate from Mixxx:
| Database | Default path | Purpose |
|---|---|---|
| MultiDJ DB | ~/.multidj/library.sqlite |
Source of truth — all metadata, crates, BPM/key/energy |
| Mixxx DB | ~/.mixxx/mixxxdb.sqlite |
Read on import; written to on sync |
You never pass the Mixxx DB as --db. The --db flag (or multidj config set-db) always refers to the MultiDJ DB.
# 1. Import your Mixxx library into the MultiDJ DB (reads Mixxx, writes to ~/.multidj/library.sqlite)
multidj import mixxx --apply
# 2. (Optional) Store your MultiDJ DB path so --db is automatic
multidj config set-db ~/.multidj/library.sqlite # only needed if you want a non-default path
# 3. (Optional) Set [mixxx].path in ~/.multidj/config.toml so --mixxx-db is automatic:
# [mixxx]
# path = "~/.mixxx/mixxxdb.sqlite"If you don't use Mixxx, import directly from your music folder:
multidj import directory ~/Music/All_Tracks --apply# Reads music_dir and mixxx path from ~/.multidj/config.toml automatically
multidj pipeline --apply
# Dry-run first to preview what would change
multidj pipeline
# Override paths for one-off runs
multidj pipeline --apply \
--music-dir ~/Music/Other_Collection \
--mixxx-db ~/backups/mixxxdb.sqlite--music-dir and --mixxx-db are optional — omit to use [pipeline].music_dir and [mixxx].path from config:
- No
--music-dirand no[pipeline].music_dir→ skip directory import step - No
--mixxx-dband no[mixxx].path→ skip Mixxx sync step
Pick up new files, clean names, and push to Mixxx without running BPM/key/energy analysis:
multidj pipeline --apply --phase quickThis runs: import → dedupe → clean_text → fix_mismatches → parse → crates → sync → report
Skips all heavy steps: BPM/key/energy analysis, audio embeddings, cue detection, clustering, metadata enrichment.
Sync pushes artist, title, album, genre, rating, and play count, and repopulates crate membership from MultiDJ. It also fills BPM only when Mixxx has none (NULL/0) and never overwrites an existing Mixxx BPM. Key is never written by sync.
⚠️ Writing a rawlibrary.bpmvalue without a matching BeatGrid BLOB is a known-unstable stopgap — Mixxx may display it inconsistently. For BPM that Mixxx recognizes reliably, useanalyze bpm→analyze mixxx-blobs(writes a real BeatGrid-2.0 BLOB).
The full pipeline handles this automatically — it imports Mixxx's existing analysis first, then only analyzes tracks with missing values. To run just these steps:
multidj import mixxx-analysis --apply # pull Mixxx's existing BPM/key into MultiDJ
multidj analyze bpm --apply # detect BPM for tracks still missing it
multidj analyze key --apply # detect key for tracks still missing it
multidj analyze mixxx-blobs --apply # write BeatGrid + KeyMap BLOBs back to Mixxxanalyze bpm and analyze key skip tracks that already have values — use --force to reanalyze everything.
multidj pipeline chains 19 steps across 4 phases, in order:
Phase 1 INGEST import → dedupe → clean_text → fix_mismatches → parse
Phase 2 ANALYZE mixxx_import → bpm → key → energy (fast — no heavy models)
Phase deep embed → cues (slow — run at night)
Phase 3 ENRICH enrich_meta → enrich_genre → clean_genres
Phase 4 SYNC cluster → crates → sync → mixxx_blobs → report
Phase quick import → dedupe → clean_text → fix_mismatches → parse → mixxx_import → crates → sync → mixxx_blobs → report
(everyday workflow: pulls Mixxx's detected BPMs, syncs to Mixxx; no librosa analysis)
multidj pipeline --apply # execute everything
multidj pipeline # dry-run: preview without writingThe pipeline is idempotent and incremental — every analyze step skips tracks it has already processed, so it's safe to re-run daily. It takes a single backup at the start.
multidj pipeline --apply --phase quick # everyday: import + BPMs from Mixxx + sync
multidj pipeline --apply --phase analyze # BPM/key/energy detection (fast, no GPU)
multidj pipeline --apply --phase deep # embeddings + cues (slow — run at night)
multidj pipeline --apply --phase ingest # only import/dedupe/clean_text/fix_mismatches/parse
multidj pipeline --apply --phase enrich # only enrich_meta/enrich_genre/clean_genres
multidj pipeline --apply --phase sync # only cluster/crates/sync/mixxx_blobs/reportmultidj pipeline --apply --skip-sync # rebuild crates without touching Mixxx
# Skip optional steps: --skip-import, --skip-bpm, --skip-key, --skip-energy,
# --skip-cues, --skip-embed, --skip-cluster, --skip-enrich, --skip-enrich-genre,
# --skip-crates, --skip-sync, --skip-mixxx-blobs, --skip-report
# Write report to custom path
multidj pipeline --apply --report-output /tmp/report.html
# Disable report generation
multidj pipeline --skip-reportThe pipeline now generates a standalone interactive dashboard report:
multidj pipeline --applyDefault output:
./multidj_report.html
Options:
multidj pipeline --report-output /tmp/report.html
multidj pipeline --skip-reportThe dashboard includes track counts, metadata coverage, top genres, crate exploration, and crate track harmonic transition indicators.
Generate dashboard directly (read-only, no --apply required):
multidj report dashboard
multidj report dashboard --output /path/to/report.htmlUse the standalone command to generate the HTML report without touching anything else:
multidj report dashboard
multidj report dashboard --output /path/to/report.htmlOn first run, MultiDJ will ask for your music directory and save it to ~/.multidj/config.toml.
~/.multidj/config.toml controls which crate dimensions are generated:
[pipeline]
music_dir = "~/Music/All_Tracks" # scanned by pipeline step 1
[crates]
bpm = true # BPM:90-105, BPM:125-130, etc.
key = true # Key:1A, Key:8B, etc. (Camelot wheel)
genre = true # Genre:House, Genre:Techno, etc.
energy = true # Energy:Low, Energy:Mid, Energy:High
language = true # Lang:Hebrew, etc.
[bpm]
min_tracks = 3 # suppress crates with fewer tracks than this
[energy]
low_max = 0.33
high_min = 0.67
[mixxx]
path = "~/.mixxx/mixxxdb.sqlite" # used by pipeline/sync/import when --mixxx-db is omittedmultidj import mixxx [--mixxx-db PATH] [--apply] [--no-backup]
multidj import directory PATH [--apply] [--no-backup]
multidj import mixxx-analysis [--mixxx-db PATH] [--apply] [--force] [--limit N]Tracks imported from any path are first-class library members — they flow through all pipeline steps regardless of where their files live.
import mixxx-analysis is a one-way pull of Mixxx's own BPM/key analysis into MultiDJ (matched by path). The pipeline runs this first in the analyze phase, so tracks Mixxx already analyzed skip MultiDJ's own BPM/key detection.
multidj sync mixxx [--mixxx-db PATH] [--apply] [--no-backup]Pushes dirty tracks and crates to Mixxx. MultiDJ is the source of truth — Mixxx is a display layer.
Cues in Mixxx. On sync, high-confidence intro/drop/outro cues are self-annotated into
Mixxx hot-cue slots — Intro → slot 0 (blue), Drop → slot 1 (red), Outro → slot 2 (green).
MultiDJ replaces only the slots it manages and never deletes hot cues from Mixxx —
running cues clear removes cues from the MultiDJ DB only; cues already written into Mixxx
(and any you set by hand) are preserved.
multidj scan # track counts, metadata coverage, crate count
multidj scan --verbose # also list all DB tables
multidj audit genres # distribution, case collisions, uninformative
multidj audit genres --top 50
multidj audit metadata # field coverage percentages
multidj backup
multidj backup --backup-dir /tmp/backupsmultidj clean genres # dry-run: case variants, uninformative → NULL, whitespace
multidj clean genres --apply
multidj clean genres --limit 50
multidj clean text # dry-run: clean artist/title/album text + remove mapped title/artist suffix garbage
multidj clean text --applymultidj analyze bpm # dry-run: list tracks needing BPM
multidj analyze bpm --apply # detect BPM from start/middle/end windows; report variable-BPM tracks
multidj analyze bpm --force # reprocess tracks that already have BPM
multidj analyze key # dry-run: list candidates
multidj analyze key --apply # detect key (Camelot) and write to DB
multidj analyze key --write-tags # also write audio file tags
multidj analyze key --force # overwrite existing keys
multidj analyze energy # dry-run: list tracks needing energy score
multidj analyze energy --apply # detect energy (0.0–1.0) and write to DB
multidj analyze energy --force # reprocess all tracksWrite proper BeatGrid-2.0 + KeyMap-1.0 protobuf BLOBs into the Mixxx DB so Mixxx recognizes BPM/key reliably (the correct path for stable BPM — see the sync note above):
multidj analyze mixxx-blobs # dry-run
multidj analyze mixxx-blobs --apply # write BeatGrid + KeyMap BLOBs to Mixxx
multidj analyze mixxx-blobs --apply --force # rewrite even where Mixxx already has analysis
multidj analyze mixxx-blobs --apply --lock-bpmLogs SKIPPED (Mixxx already owns the BPM) or WROTE per track so the BPM/key
protection is observable.
Structural segmentation (intro/verse/chorus/drop/outro) via allin1 + librosa
cross-validation. On sync mixxx, high-confidence intro/drop/outro are pushed to Mixxx
hot-cue slots 0/1/2 (see "Cues in Mixxx" below).
multidj analyze cues # dry-run
multidj analyze cues --apply # detect and store cue_points
multidj analyze cues --force # re-detect (manual cues are never overwritten)
multidj cues clear --apply # remove all auto-detected cues from the MultiDJ DBmultidj analyze embed # dry-run: show how many tracks need embedding
multidj analyze embed --apply # encode with CLAP (laion/larger_clap_music, 512-dim)
multidj analyze embed --apply --model clamp3 # encode with CLaMP3 (768-dim, cross-modal)
multidj analyze embed --force # re-encode all tracks
multidj analyze embed --limit 50 # encode at most 50 tracks this runCLAP and CLaMP3 embeddings coexist per track in the embeddings table (composite key on (track_id, model_name)).
multidj cluster vibe # dry-run: show cluster count that would be found
multidj cluster vibe --apply # UMAP+HDBSCAN → write Vibe/ auto-crates to DB
multidj cluster vibe --apply --min-cluster-size 10Produces crates named Vibe/Dark-Techno (if LLM naming configured) or Vibe/Cluster-01 (fallback). Noise tracks go to Vibe/Unclassified. LLM naming via ~/.multidj/config.toml:
[llm]
base_url = "https://opencode.ai/api/v1"
api_key = "your-api-key"
model = "deepseek/deepseek-chat"multidj similar "AT NIGHT DUB" # 10 most similar tracks by cosine distance
multidj similar "Artist - Title" # partial match on artist+title
multidj similar /path/to/file.mp3 # exact path match
multidj similar "AT NIGHT DUB" --top 20multidj suggest "Artist - Title" # top 10 next-track candidates
multidj suggest "Artist - Title" --top 5 # show 5
multidj suggest "Artist - Title" --bpm-window 10 # tighter BPM tolerance (default: 15)
multidj suggest "Artist - Title" --any-cluster # search whole library, not just same Vibe/ clusterRanks candidates by a composite DJ-mixing score:
score = 0.70 × cosine_similarity
+ 0.15 × BPM_compatibility (linear decay to 0 at ±bpm-window BPM)
+ 0.15 × Camelot_key_compat (1.0 = same key, 0.75 = adjacent/relative, 0.0 = clash)
By default results come from the same Vibe/ cluster as the query track (run cluster vibe --apply first).
multidj parse # dry-run: propose artist/title/remixer from filenames
multidj parse --apply
multidj parse --min-confidence high
multidj parse --force # overwrite already-tagged fieldsmultidj enrich language # detect Hebrew tracks (Unicode range check)
# Metadata enrichment — requires uv sync --extra enrich
multidj enrich metadata # dry-run: file tags → Discogs → MusicBrainz
multidj enrich metadata --apply # write release_year, label, album, genre to DB
multidj enrich metadata --apply --write-tags # also update audio file tags
multidj enrich metadata --force # re-enrich already-enriched tracksConfigure Discogs and MusicBrainz tokens in ~/.multidj/config.toml under [discogs] and [musicbrainz].
Genre hardening (pipeline enrich_genre step). A layered decision tree fills each
track's genre with provenance — file → Discogs → MusicBrainz → CLAP zero-shot — and records
genre_source (file/discogs/musicbrainz/clap/manual) and genre_confidence on the
track (migration 008). It runs in the enrich phase of multidj pipeline. Tracks with
genre_source = 'manual' are never overwritten; the step is incremental (skips tracks that
already have a genre_source unless --force). Skip it with --skip-enrich-genre.
These tools run outside the CLI and produce self-contained HTML files:
# Interactive UMAP scatter plot — click a track to see next-track suggestions in sidebar
python scripts/viz_library.py
python scripts/viz_library.py --out /tmp/viz.html --neighbors 10
# Data science diagnostics — 6-panel dashboard: coverage, genre, BPM, key, similarity, clusters
python scripts/diagnostics.py
python scripts/diagnostics.py --out /tmp/diag.html --sample 600
# Zero-shot genre detection using CLAP + folder heuristics
python scripts/genre_detect.py --limit 50 # dry-run
python scripts/genre_detect.py --apply # write genres to DBmultidj report dashboard # generate interactive standalone dashboard
multidj report dashboard --output report.htmlThis command is read-only and respects --db.
Dashboard crate views include Camelot-based transition analysis between adjacent tracks:
- ✅ compatible: same key or +/- 1 step on same A/B ring
⚠️ risky: relative major/minor (same number, A/B swap)- ❌ incompatible: all other transitions
Current phase is analysis + visualization only:
- No DB schema changes
- No automatic crate mutation
- No pipeline enforcement of harmonic rules
- UI reordering/flags are local to the dashboard session (not persisted yet)
# Auto-crate dimensions: Genre: | BPM: | Key: | Energy: | Lang:
# Crate tiers: "catch-all" (New Crate) | "auto" (prefixed) | "hand-curated"
# Hand-curated and catch-all are always protected by default.
multidj crates audit # inventory and classification
multidj crates audit --summary # counts only
multidj crates audit --min-tracks 10
multidj crates hide # dry-run: hide auto crates below threshold
multidj crates hide --apply # sets show=0 (reversible)
multidj crates hide --include-hand-curated
multidj crates show # dry-run: restore hidden crates
multidj crates show --apply --min-tracks 5
multidj crates delete # dry-run: permanently delete auto crates
multidj crates delete --apply
multidj crates rebuild # dry-run: preview all auto-crates
multidj crates rebuild --apply # delete old auto-crates, recreate from current data
multidj crates rebuild --min-tracks 10crates rebuild generates crates for all enabled dimensions in config: Genre:, BPM:, Key: (Camelot wheel), Energy: (Low/Mid/High), Lang:. Vibe/ crates are managed separately by cluster vibe.
# Keeper: most played → highest rated → largest file
# --apply uses soft-delete (deleted=1, reversible)
multidj dedupe # dry-run: show groups with keeper/duplicate detail
multidj dedupe --by artist-title
multidj dedupe --by filesize
multidj dedupe --apply| Flag | Effect |
|---|---|
--json |
Structured JSON output (accepted anywhere in the command line) |
--db PATH |
Override MultiDJ DB path (default: ~/.multidj/library.sqlite) |
--version |
Show version |
Override DB path with env var: MULTIDJ_DB_PATH=/path/to/library.sqlite
- All commands are dry-run by default — nothing written without
--apply - Automatic timestamped backup before every write (skip with
--no-backup) pipeline --applycreates one backup at the start — not once per stepdedupe --applyandcrates delete --applyuse soft-delete (deleted=1) — recoverable- Per-track error isolation in all analyze commands — one bad audio file never aborts the batch
- All stats exclude soft-deleted tracks
sync mixxx --applybacks up the Mixxx DB before writing
MultiDJ owns ~/.multidj/library.sqlite. Mixxx is a sync target, not a source.
import mixxxis a one-time bootstrap — after that, manage everything in MultiDJsync mixxx --applyis a one-way push — MultiDJ → Mixxx always- Crates created directly in Mixxx are not imported and will be overwritten on next sync
- The
sync_statetable tracks dirty flags per-track per-adapter; an AFTER UPDATE trigger ontrackssetsdirty=1automatically whenever a track row changes full_synconly pushes tracks wheredirty=1 AND deleted=0— incremental by design
CLAUDE.mdis the authoritative architecture reference (layered module map, schema, and design invariants). High-level summary:
multidj/
├── cli.py — argparse entry point, global flag hoisting, subcommand dispatch
├── pipeline.py — run_pipeline(): 19 steps / 4 phases, single backup, step isolation
├── config.py / db.py — config.toml load/save; connect() + migration runner
├── backup.py / utils.py — timestamped backups; emit() JSON/human output
├── constants.py / models.py — genre lists, prefixes, CAMELOT_KEY_MAP; LibrarySummary
├── scan.py / audit.py — library stats; genre/metadata audits
├── clean.py / parse.py — text+genre normalization; filename parsing
├── analyze.py — analyze_bpm(), analyze_key(), analyze_energy()
├── embed.py / embed_clamp3.py — CLAP + CLaMP3 audio embedding backends
├── cluster.py / suggest.py — UMAP+HDBSCAN Vibe/ crates; DJ next-track ranking
├── cues.py — structural cue detection (intro/drop/outro)
├── enrich.py / enrich_genre.py — file→Discogs→MusicBrainz metadata; layered genre hardening
├── mixxx_blobs.py — BeatGrid-2.0 + KeyMap-1.0 protobuf BLOB writer
├── import_mixxx_analysis.py — one-way pull of Mixxx's own BPM/key into MultiDJ
├── crates.py / dedupe.py — auto-crate rebuild; duplicate detection
├── report.py — HTML dashboard generator
├── adapters/ — base.py (SyncAdapter ABC), mixxx.py, directory.py
└── migrations/ — 001…008 SQL, auto-applied in order on write connections
scripts/ — standalone HTML tools: viz_library.py, diagnostics.py, genre_detect.py
tests/ — full suite (386 tests); fixtures build fresh SQLite DBs per test
pytest tests/ -v
pytest tests/test_pipeline.py -v # single module| Path | Purpose |
|---|---|
~/.multidj/library.sqlite |
MultiDJ DB (source of truth); configurable via multidj config set-db |
~/.multidj/config.toml |
User configuration (crate dimensions, music dir, LLM API) |
~/.multidj/backups/ |
Timestamped backups |
~/.mixxx/mixxxdb.sqlite |
Mixxx DB (read on import, written on sync) |
~/.cache/huggingface/hub/models--laion--larger_clap_music/ |
CLAP model weights (~1.5 GB, downloaded once on first analyze embed --apply) |
- Pipeline restructured into 4 phases / 19 steps (ingest → analyze → enrich → sync), each
independently runnable via
--phase.--skip-<step>flags are available for every step. (Supersedes the "10 steps" / "14 steps" counts in the older notes below.) - Genre hardening: new
enrich_genrestep + migration 008 (genre_source,genre_confidence) — layered file→Discogs→MusicBrainz→CLAP with provenance. - Cue → Mixxx sync: intro/drop/outro pushed to hot-cue slots 0/1/2; MultiDJ never deletes
Mixxx hot cues (
cues clearclears the MultiDJ DB only). - BPM into Mixxx: sync now fills
library.bpmwhen Mixxx has none (a known-unstable stopgap). The reliable path remainsanalyze bpm→analyze mixxx-blobs(BeatGrid BLOB);mixxx_blobslogs per-track SKIPPED/WROTE for observability. - Branches consolidated: all work lives on
dev(mirrored tomaster).
[mixxx]config section added:~/.multidj/config.tomlnow supports[mixxx]with apathkey for the Mixxx DB location.get_mixxx_db_path()reads it. All Mixxx commands fall back to this when--mixxx-dbis omitted.- CLI fallback:
pipeline,sync mixxx,import mixxx, andanalyze mixxx-blobsuseargs.mixxx_db or get_mixxx_db_path(cfg)— no need to pass--mixxx-dbif[mixxx].pathis configured. - Idempotent pipeline: All analyze steps skip already-processed tracks (WHERE field IS NULL / LEFT JOIN check). Safe to re-run daily.
- Source-of-truth enforced via trigger: The
sync_statetable's AFTER UPDATE trigger ontrackssetsdirty=1automatically.full_syncpushes onlydirty=1 AND deleted=0tracks.
- Semantic embeddings:
multidj analyze embed --applyencodes tracks with CLAP (laion/larger_clap_music, 512-dim). Requiresuv sync --extra embeddings. Embeddings stored as BLOBs in theembeddingstable (migration 004). - Auto-clustering:
multidj cluster vibe --applyruns UMAP+HDBSCAN on the embedding matrix and writesVibe/auto-crates. LLM naming via[llm]config; falls back toVibe/Cluster-NN. - Similarity search:
multidj similar <track> [--top N]— KNN cosine-distance search in embedding space, read-only. - Pipeline now 14 steps:
analyze embed(step 8) andcluster vibe(step 9) added. Both skip gracefully if[embeddings]extra not installed.--skip-embedand--skip-clusterflags available.
- Clean text behavior now strips promotional noise markers from artist/title tails, including free, dl, and download variants.
- BPM analysis now samples start/middle/end windows and reports variable-tempo cases instead of hiding half/double-time ambiguity.
- Directory import now includes artist-title swap mismatch detection for stronger metadata hygiene during ingestion.
- Directory import now soft-deletes (
deleted=1) tracks whose files no longer exist on disk after a rescan. - Pipeline expanded to 10 steps:
fix_mismatches(step 2) auto-corrects artist/title swaps across all active tracks;clean_text(step 8) strips promo markers from artist/title/album. - Added persistent DB path config:
multidj config set-db <path>stores[db].path, and commands now use it when--dbis omitted. - Parse now skips junk artist/title proposals (numeric-only and
free/dl/downloadmarker values) to reduce bad suggestions in common use. - Added
multidj report dashboardfor standalone interactive HTML dashboard output with optional--outputpath. - Pipeline report step now generates the interactive dashboard by default while remaining read-only and non-fatal.
- Added experimental Camelot harmonic transition analysis/visualization in crate views (UI-only interactions, no DB persistence).