🤖 feat: agent memory — six-command memory tool, three scopes, curation UI, hot-set preloading (experiment) by ThomasK33 · Pull Request #3526 · coder/mux

ThomasK33 · 2026-06-11T21:41:02Z

Summary

Adds agent memory to Mux behind a new memory experiment (off by default): a provider-agnostic memory tool implementing Anthropic's six-command protocol (view/create/str_replace/insert/delete/rename), backed by a main-process MemoryService with three scopes (global / project / workspace), a tree-styled curation UI (right-sidebar Memory tab + Settings → Memory for global files), and usage-tracked hot-memory preloading with prompt-cache-stable injection.

Background

Agents forget durable user preferences and project facts between workspaces. This PR gives them a filesystem-like memory surface they can read/write via tool calls, plus automatic context injection so frequently-used/pinned memories are available without tool calls. Design follows Anthropic's memory tool semantics so models prompted for that protocol work out of the box, while staying provider-agnostic (flattened nullish schema; verified zero invalid tool calls across Anthropic/OpenAI/Google in dogfooding).

Implementation

Scopes (models only ever see virtual /memories/... paths):
- /memories/global/… → <muxHome>/memory/ (host-local, shared across projects)
- /memories/project/… → <checkout>/.mux/memory/ via the Runtime abstraction (git-tracked, works over SSH; remote writes via temp+mv)
- /memories/workspace/… → session dir (dies with the workspace)
Security envelope enforced once in MemoryService: pre-resolution traversal rejection, realpath parent-walk symlink containment (local + shell-quoted remote), atomic writes, 100KB/file + 1000 files/scope caps, self-healing loads, plain-text-only rendering (memory content is attacker-influenceable).
Access policy per agent class × scope: exec rw everywhere; plan-like read-only on project scope (tracked tree); explore view-only.
Context tiers: static tool description → per-request memory index (path + frontmatter description, system-message tail) → hot set (pinned + top-K by decayed usage, 16KB/item / 48KB total, recomputed only at session start/compaction boundaries for cache stability) → cold via tool calls.
Pins + usage stats live in a host-local sidecar (memory-meta.json) keyed by logical identity — never in git-tracked files (a cloned repo must not be able to force itself into the hot tier).
UI: shared MemoryBrowser/MemoryFileEditor render scope sections with a native directory tree (chevrons, indent guides, recursive counts), descriptions, usage stats, pin/delete row actions, agent-edited badges via memory.onChange, and sha256-conflict-safe saves. Settings → Memory manages global files workspace-independently (workspaceId is nullish on memory.* routes). Server-side experiment gating on every route.

Validation

Three blocking dogfood gates with screenshot/recording/raw-request evidence:
- G1: cross-workspace remember→recall unprompted across 3 providers (11 calls, 0 invalid); index injection verified in raw provider requests; bounded re-cache on index change (tool/schema prefix stays cached); SSH project-scope round-trip (jest integration test against the sshd fixture).
- G2: UI edit visible to a running agent's next view; conflicting save surfaces a conflict banner with no silent clobber; pin survives reload.
- G3: pinned memory canary answered from context with zero tool calls (proven in raw request bytes); hot set byte-identical across turns (sha256 match, 18.8k cache-read); index ≈434 chars + hot set ≈578 chars measured.
~150 unit/integration tests across memoryService (security matrix, concurrency, remote runtime), tool dispatch (18 mode-matrix cells), router (experiment-off rejection, sha preconditions), sidecar self-healing, hot-set selection/caching, and UI flows.
make static-check green.

Risks

Off-by-default experiment: with it off there is no tool, no index, no tab, and routes reject — zero behavior change for existing users.
Highest-regression-risk touchpoints are aiService/streamContextBuilder (system-message assembly) and agentSession (hot-block caching); both are additive and experiment-gated, with tests asserting byte-stable output and zero context change when disabled.
Known limitations documented in code: no cross-process file locking (single main process assumption); mid-session pins take effect at the next session segment by design.

Pains

dev-server-sandbox default seeding resumed parent-instance tasks and deleted a live agent worktree during gate G3 (recovered from committed state; --clean-projects avoids it) — worth a separate fix.
Seeded gateway credentials 403 in sandboxes; gates had to wire providers from env keys.

📋 Implementation Plan

Agent Memory in Mux — Implementation Plan

Adopt Anthropic's six-command memory protocol (view, create, str_replace, insert, delete, rename) as a provider-agnostic Mux tool, backed by a main-process MemoryService with three scopes (global / project / workspace), a curation UI in the right sidebar, and usage-tracked hot-memory preloading. Everything is gated behind a new memory experiment.

Grounded in: deep-research findings (verified against Anthropic SDK/docs, Vercel AI SDK, MCP memory server) + four repo investigations (experiments system, runtime abstraction, tool pipeline, service/IPC/UI patterns).

Locked decisions

Decision	Choice
`create` on existing file	Error (current Anthropic spec). Overwrite = `delete` + `create`. Documented in tool description
Project memories in git	Committed by default (reviewable in PRs via normal Review pane flow)
Scope rollout	All three scopes at once (global, project, workspace)
Runtimes	Must work on all runtimes (local, worktree, SSH) — project scope goes through the `Runtime` abstraction
Gating	`memory` experiment (`EXPERIMENT_IDS.MEMORY`), off by default, user-overridable, shown in Settings

Architecture

flowchart TD
    Agent["Agent tool call<br/>memory (6 commands)"] --> Norm["Zod preprocess + flattened<br/>nullish schema"]
    UI["Memory tab (right sidebar)<br/>view / edit / pin / delete"] -->|oRPC| Svc
    Norm --> Svc["MemoryService (main process)<br/>path validation · per-root mutex ·<br/>atomic writes · stats · events"]
    Svc -->|"local fs"| G["~/.mux/memory/ (global)"]
    Svc -->|"runtime.readFile/writeFile/exec<br/>(works over SSH)"| P["&lt;checkout&gt;/.mux/memory/<br/>(project, git-tracked)"]
    Svc -->|"local fs"| W["~/.mux/sessions/&lt;ws&gt;/memory/<br/>(workspace, ephemeral)"]
    Svc -.->|"EventEmitter → oRPC eventIterator"| UI
    Svc -.->|"index → late context block<br/>hot set → context injection (P3)"| Ctx["Model context"]

Scopes and physical mapping

Models only ever see virtual paths; MemoryService maps them. Physical paths never leak into context.

Virtual path	Physical location	Access	Lifecycle
`/memories/global/…`	`<muxHome>/memory/` on the host (`getMuxHome()` from `src/common/constants/paths.ts`)	local fs	Permanent; shared across all projects
`/memories/project/…`	`<workspace checkout>/.mux/memory/`	`Runtime` (`runtime.readFile`/`stat`/`ensureDir` + `exec` for listing/realpath); writes: `write-file-atomic` on local runtimes, `SSHRuntime.writeFile` (temp+`mv`) on remote	Git-tracked; travels with worktrees; merges via normal git flow
`/memories/workspace/…`	`config.getSessionDir(workspaceId)/memory/` on the host	local fs	Deleted with the workspace

Cross-runtime notes (verified):

Tool handlers run in the main process and already receive config.runtime + config.cwd; readFileString/writeFileString (src/node/utils/runtime/helpers.ts) work transparently over SSH (SSHRuntime.writeFile is already atomic: cat > tmp && mv).
Session dirs are always host-local, even for SSH workspaces → workspace scope is uniform everywhere.
Global scope is host-local by definition (the main process runs on the host); precedent: global skills always resolve through a local runtime (resolveGlobalRuntime in agent skills).
Remote listing mirrors skills discovery: RemoteRuntime → execBuffered(runtime, "find …") (see listSkillDirectoriesFromRuntime in src/node/services/agentSkills/agentSkillsService.ts).

Tool surface (model-facing)

One memory tool. Flattened object schema (not a discriminated union — strict-mode providers flatten unions poorly): command enum + all op-specific fields .nullish(), dispatch in the handler with != null checks. This follows the repo's documented conventions in src/common/utils/tools/toolDefinitions.ts.

command: "view" | "create" | "str_replace" | "insert" | "delete" | "rename"
path, file_text, old_str, new_str, insert_line, insert_text, old_path, new_path  → all .nullish()
offset, limit  → .nullish() numbers (replaces Anthropic's view_range tuple; mirrors file_read)

Zod preprocessor shims (same mechanism as bash command→script): file_path/filePath→path, content→file_text, old_string→old_str, new_string→new_str.
Reference semantics copied from the Anthropic SDK: view on a directory lists ≤2 levels deep, excludes dotfiles; str_replace requires unique old_str and returns matching line numbers on ambiguity (recoverable tool error); insert at line N; rename for move; delete for file/dir.
create errors on existing files (locked decision).
Static tool description; dynamic index in a late context block. The tool description carries only the protocol ("check relevant memories before acting; record durable facts/preferences") + command semantics, so the cached tool/schema prefix stays stable. The per-request memory index (virtual path + one-line description per file, plus a pinned marker once P2 sidecar pins exist) is injected at stream-context assembly (buildStreamSystemContext in src/node/services/streamContextBuilder.ts) as a late context block — preferred placement: transient system-reminder-style block near the tail of the message list (cache-optimal; the tail changes every turn anyway), falling back to an end-of-system-message section if no tail-injection mechanism fits cleanly. This deliberately diverges from the skills index (which lives in agent_skill_read's description): skills change rarely, memory files change mid-session, and tool-description churn would invalidate the cached tool prefix. G1 measures cache hit rates to validate the placement.

File format

Markdown with optional YAML frontmatter carrying display metadata only (description, used for the index — same repo trust level as the existing skills index). Pins and usage stats never live in the files: a committed pinned: true in a cloned repo would force attacker-chosen content into the hot context tier, and stats in git-tracked files create commit noise/merge conflicts. Both live in a host-local, service-owned sidecar; pinning is a user/UI action only.

Security envelope (enforced once, in MemoryService)

Verified against Anthropic's reference implementations + existing skills containment checks:

Reject absolute paths, ~, .. segments, URL-encoded traversal before resolution; then resolve and enforce prefix containment under the scope root.
Symlink escape prevention: local scopes via fs.realpath walk (parent-walking when components don't exist — Python SDK _validate_no_symlink_escape pattern); project scope on remote runtimes via runtime.exec with every path shell-quoted, the root ensureDir'd first, then a remote realpath parent-walk containment check before any mutation or listing.
Atomic writes: all local-disk writes (global, workspace, and project scope on local/worktree runtimes) go through service-level write-file-atomic (already a repo dependency) — never raw streams; project scope on remote runtimes uses SSHRuntime.writeFile (already temp+mv).
Index hardening: frontmatter description is repo-controlled — the index renders it single-line, truncated (~200 chars), control characters stripped, quoted as data; the index block notes that project memory metadata/content is untrusted input until the user/agent chooses to rely on it.
Caps: 100KB/file, 1,000 files/scope (constants in src/common/constants/); str_replace/insert only on UTF-8 text.
Self-healing: malformed frontmatter/stats files are skipped/sanitized at load — never brick a workspace (crash-resilience doctrine).
Renderer: memory content is attacker-influenceable (project memories arrive via cloned repos) — render as plain text/React trees, no innerHTML-family sinks; no SECURITY-AUDIT-worthy sinks added.

Concurrency & conflicts

All writes (agent tool + UI) funnel through MemoryService → MutexMap keyed by physical root (src/node/utils/concurrency/mutexMap.ts) eliminates intra-process races. No filesystem locking in v1 (single main process; the dev-server sandbox uses its own MUX_ROOT, so cross-process collisions on the same global root are out of scope — documented limitation).
Agent str_replace is naturally optimistic (old_str must match → recoverable tool error on conflict).
UI saves carry a contentSha256 captured at load; service rejects on mismatch (re-read & retry prompt) — the verified optimistic-concurrency pattern.
Project scope across parallel agents rides worktree isolation + git merge (no runtime contention by construction); only global scope truly contends, and it's serialized by the mutex.

Experiment gating (`memory`)

Define MEMORY: "memory" in EXPERIMENT_IDS + registry entry in EXPERIMENTS (src/common/constants/experiments.ts): enabledByDefault: false, userOverridable: true, showInSettings: true → appears automatically in Settings → Experiments.
Backend: aiService.ts resolves experimentsService.isExperimentEnabled(EXPERIMENT_IDS.MEMORY) (with client override passthrough, same as DYNAMIC_WORKFLOWS) → passed into getToolsForModel (src/common/utils/tools/tools.ts) to conditionally register the memory tool. Experiment off ⇒ no tool, no index, zero context cost.
Frontend: Memory tab gated via featureFlag: EXPERIMENT_IDS.MEMORY in TAB_CONFIG_DEF (src/browser/features/RightSidebar/Tabs/tabConfig.ts); components use useExperimentValue(EXPERIMENT_IDS.MEMORY).
Server-side enforcement: the oRPC memory.* routes themselves check isExperimentEnabled and reject when disabled — UI hiding alone is not the gate.
Tests: backend spyOn(experimentsService, "isExperimentEnabled"); frontend spyOn(ExperimentsModule, "useExperimentValue") (established patterns).

Mode / sub-agent write policy (command-level, enforced in the handler)

The memory tool handler receives a memoryAccess policy via ToolConfiguration (alongside the existing planFileOnly plumbing used by validatePlanModeAccess in src/node/services/tools/fileCommon.ts) and enforces it per command + scope — not via regex tool policy, which can only match tool names:

Agent context	global	project	workspace
Exec-like	read/write	read/write	read/write
Plan-like	read/write	read-only	read/write
Explore/read-only	read-only	read-only	read-only

Rationale: plan mode must not mutate the tracked source tree, and project memories are git-tracked — so project scope is read-only in plan mode, while global/workspace writes (agent-owned, untracked state) stay allowed for capturing lessons during planning. Exec-mode project writes surface in the Review pane like any diff (a feature). Tests assert mutating commands are rejected per matrix cell.

Phases

Phase 0 — Experiment flag + constants (~40 net LoC)

src/common/constants/experiments.ts: MEMORY id + definition.
src/common/constants/memory.ts (new): scope ids, virtual root (/memories), caps, char budgets.
Settings → Experiments row appears automatically.

Phase 1 — MemoryService + `memory` tool (agent-facing MVP) (~1,200 net LoC)

Files:

src/node/services/memoryService.ts (new, ~400): scope mapping, path validation + symlink containment (local + via-runtime), six command implementations, per-root MutexMap, caps, EventEmitter change events {scope, path, actor, workspaceId}. Constructor-injected via ServiceContainer/coreServices (existing DI pattern).
src/node/services/tools/memory.ts (new, ~250): ToolFactory dispatching to MemoryService; static tool description (protocol + semantics); per-command+scope memoryAccess enforcement; recoverable error strings copied from the Anthropic SDK semantics.
src/node/services/streamContextBuilder.ts (~60): build + inject the per-request memory index context block (experiment-gated).
src/common/utils/tools/toolDefinitions.ts (~80): flattened schema + preprocessor shims.
src/common/utils/tools/tools.ts + src/node/services/aiService.ts (~40): registration in runtimeTools (project scope needs workspace init), experiment-gated.
src/browser/features/Tools/Shared/getToolComponent.ts + ToolPrimitives.tsx + src/browser/features/Tools/MemoryToolCall.tsx (new, ~150): chat renderer (per-command compact display), lucide icon mapping (no emoji).
Sub-agents: tool availability per existing tool policy; write permissions per the command-level memoryAccess matrix above (explore agents get view only).

Tests (not counted in LoC): traversal/symlink/atomicity/create-errors/str_replace-ambiguity unit tests against a temp dir (incl. local project create/edit atomicity); per-command handler tests; mode-matrix rejections per cell (plan-mode project mutations reject; explore mutations reject); remote containment (shell-quoted traversal + symlink-escape attempts through the runtime, SSH fixture); cross-workspace remember→recall integration test; experiment-off ⇒ tool absent.

Gate G1 (dogfood, blocking):

make dev-server-sandbox (isolated MUX_ROOT, free ports — per dev-server-sandbox skill). Bootstrap browser workflows first: agent-browser skills get core.
Drive the web UI with agent-browser (per agent-browser/dogfood skills): enable the memory experiment in Settings; in workspace A tell the agent "remember that I prefer X"; verify file exists under the sandbox MUX_ROOT/memory/… (bash); create workspace B; ask "what do you know about my preferences?" → agent recalls via view unprompted.
Repeat the loop across ≥3 providers (Anthropic, OpenAI, Google) and record invalid-tool-call rates — empirically settles the flattened-schema question from the research.
Verify index injection with bun run debug ui-messages --workspace <name>; record provider cache-read/cache-write token stats across consecutive turns (and across a mid-session memory create) to confirm the index placement preserves prompt caching.
SSH runtime: integration test against SSHRuntime (localhost sshd or existing SSH test fixtures) — project-scope create/view round-trip.
Evidence: annotated screenshots (agent-browser screenshot --annotate) + repro video (record start/stop) attached via attach_file.

Phase 2 — Curation UI (Memory tab) (~750 net LoC)

Files:

src/common/orpc/schemas/api.ts + src/node/orpc/router.ts (~180): memory.list (bulk: all scopes, one call), memory.read, memory.save (with expectedSha256), memory.delete, memory.setPinned, memory.onChange (eventIterator, async-generator bridge off the service's EventEmitter — same shape as workspace.onMetadata). All routes take workspaceId; the router resolves that workspace's metadata + active Runtime + checkout cwd through the existing WorkspaceService/runtime plumbing (same as plan-file IPC) so project scope never touches local fs paths on SSH workspaces. Routes reject when the experiment is off.
src/node/services/memoryMeta.ts (new, ~80): host-local sidecar store (<muxHome>/memory-meta.json) introduced here with schema {pinned} per logical key, atomic persistence (write-file-atomic), self-healing load (corrupt file ⇒ start empty + log). P3 extends the schema with usage stats.
tabConfig.ts + tabRegistry.tsx (~30): memory tab with featureFlag.
src/browser/features/RightSidebar/Memory/* (new, ~450): scope-grouped file list, Markdown view/edit, pin toggle, delete, "agent edited" badge from change events, conflict banner on 409-style save rejection. Leaf components subscribe directly (colocation rule); usePersistedState only for pure UI state (expanded groups); pins live in the service-owned host-local sidecar (introduced here for pins; P3 extends it with usage stats), not in files or localStorage. Keyboard shortcut in KEYBINDS (hidden on mobile). Conditional rendering (no Radix portals) for happy-dom testability.

Tests: UI tests for list/edit/conflict flows; router tests for sha precondition and experiment-off ⇒ memory.* routes reject; sidecar self-healing (corrupt meta file ⇒ empty, no crash).

Gate G2 (dogfood, blocking): agent-browser session — edit a memory in the panel while an agent task is running → agent's next view sees the edit; force a conflicting save (agent writes between UI load and UI save) → conflict surfaces, no silent clobber; pin a file → badge/state survives reload. Screenshots + recording attached.

Phase 3 — Usage stats, hot set, preloading (~400 net LoC)

Sidecar outside the memory roots (host-local, e.g. <muxHome>/memory-meta.json, atomic writes): {lastAccessedAt, accessCount, lastWriteAt, pinned} keyed by logical identity, not physical path — global:<relpath>, project:<projectId>:<relpath>, workspace:<workspaceId>:<relpath>. projectId = the existing stable project identity from Mux config where available, else a hash of {runtimeKind, remote host identity, normalized project root} — never the physical worktree path, so N worktrees of one repo share heat/pins, branch-agnostic by relpath. Recorded at the MemoryService chokepoint; never git-tracked.
Three context tiers: index (always, late context block — P1) → hot set (user-pinned via sidecar + top-K by recency/frequency under budgets, e.g. 16KB/item, 48KB total in src/common/constants/memory.ts) → cold (tool call). Auto-hot selection is gated on actual local usage (sidecar stats), which a cloned repo cannot fake.
Hot-set injection mirrors the post-compaction loaded-skill snapshot mechanism (loadedSkillSnapshots.ts / compactionHandler.ts / attachments.ts budgets): recomputed only at session start and compaction boundaries — never per turn — to preserve provider prompt caching.
Heat-metric guards: writes/pins count as uses; age decay; known limitation (self-reinforcing hot set) documented; demotion-sampling deferred to P4.
Memory tab shows lastAccessed/accessCount.

Gate G3 (dogfood, blocking): pinned memory provably in context without a tool call (bun run debug ui-messages); hot set byte-identical across consecutive turns within a session (cache stability); measured index + hot-set char/token cost reported with screenshots.

Phase 4 — Later (not in this change set)

Pre-compaction "flush working context to memory" warning (compactionHandler hook); demotion-sampling experiment; cross-process locking if the global root is ever shared between processes; memory sync/import-export.

Total v1 (P0–P3): ~2,400 net LoC product code (range 2,300–2,900; P1 carries the plumbing risk).

Risks / notes

Strict-mode schema behavior is the biggest unknown → retired empirically at G1 step 3 before any UI work.
Index placement vs. caching: if no clean tail-injection mechanism exists, the end-of-system-message fallback invalidates message-history cache on mid-session index changes; G1 cache telemetry decides whether more work is needed before P2.
Prompt-injection: index lines come from repo-controlled frontmatter — same trust level as the existing skills index; content stays behind explicit tool calls or budget-capped hot tier.
.mux/memory + dotfile exclusion: the project root maps inside .mux/memory/, so the view dotfile-exclusion rule applies to its contents, not the root itself.
Upgrade/downgrade safe: experiment off by default; memory dirs are plain files; no migrations.

Acceptance criteria

With the experiment off: no memory tool in any request, no tab, the UI makes no memory calls, and the server rejects direct memory.* oRPC calls; zero context-size change.
With it on: agent can view/create/str_replace/insert/delete/rename in all three scopes on local and SSH workspaces; create errors on existing files; traversal/symlink escapes rejected on both local and remote, and the mode/sub-agent access matrix is enforced (tests prove each cell).
Memories persist across workspaces (global), travel with branches/PRs (project), die with the workspace (workspace).
Users can view/edit/pin/delete every memory in the Memory tab with live agent-edit updates and conflict-safe saves.
Pinned/hot memories appear in context without tool calls, within fixed budgets, cache-stable within a session.
make static-check + targeted tests green; all three dogfood gates passed with attached evidence.

Generated with mux • Model: anthropic:claude-fable-5 • Thinking: max • Cost: $305.22

Signed-off-by: Thomas Kosiewski <tk@coder.com>

… and change events Signed-off-by: Thomas Kosiewski <tk@coder.com>

… per-scope write policy Signed-off-by: Thomas Kosiewski <tk@coder.com>

…t index injection, and coreServices Signed-off-by: Thomas Kosiewski <tk@coder.com>

…r-command display Signed-off-by: Thomas Kosiewski <tk@coder.com>

…built-in skill content Signed-off-by: Thomas Kosiewski <tk@coder.com>

…l identity Signed-off-by: Thomas Kosiewski <tk@coder.com>

…index entries carry scope+relPath Signed-off-by: Thomas Kosiewski <tk@coder.com>

…ve/delete/setPinned/onChange) Signed-off-by: Thomas Kosiewski <tk@coder.com>

…, editor, pins, and live agent-edit badges Signed-off-by: Thomas Kosiewski <tk@coder.com>

Signed-off-by: Thomas Kosiewski <tk@coder.com>

…#1) Signed-off-by: Thomas Kosiewski <tk@coder.com>

…c fix Signed-off-by: Thomas Kosiewski <tk@coder.com>

…sedAt, lastWriteAt) Pins now count as uses; unpinning preserves stats. Adds renameKeys/removeKeys subtree maintenance and self-healing sanitization for malformed stats fields. Signed-off-by: Thomas Kosiewski <tk@coder.com>

…ogical keys Every successful agent command (view/create/str_replace/insert/rename) and UI read/save records a use in the sidecar; deletes and renames maintain sidecar entries subtree-aware. MemoryScopeContext gains the stable projectPath identity from Mux config (never the physical worktree path). Signed-off-by: Thomas Kosiewski <tk@coder.com>

…usted-content block Pinned files rank first; auto-hot files rank by half-life-decayed access frequency. Greedy fill under 16KB/item + 48KB total budgets (constants in src/common/constants/memory.ts); preloading bypasses usage recording. Signed-off-by: Thomas Kosiewski <tk@coder.com>

…ndaries AgentSession caches the rendered block per session segment (undefined = not computed, null = nothing to inject) and invalidates it only when a pending compaction boundary is consumed, keeping the injected bytes prompt-cache-stable. AIService.buildHotMemoriesBlock gates on the memory experiment and self-heals to null; streamContextBuilder appends the block after the memory index. Signed-off-by: Thomas Kosiewski <tk@coder.com>

… tab memory.list now returns sidecar usage stats per file; rows render 'Used N× · <relative time>' for used files only. Signed-off-by: Thomas Kosiewski <tk@coder.com>

…egration test Gate G1 step 5: verifies create/view/strReplace and create-on-existing rejection for project-scope memories through a real SSHRuntime against the Docker sshd fixture (TEST_INTEGRATION=1 bun x jest tests/runtime/memory-ssh.test.ts). Signed-off-by: Thomas Kosiewski <tk@coder.com>

…nstructor and projectPath

memory.* oRPC inputs now take a nullish workspaceId. Without one, the router resolves a stub scope context (runtime: null) so only the global scope is reachable; project/workspace paths fail with recoverable errors, mirroring the existing workspace-scope guard. This unblocks a Settings-level UI for global memories that has no workspace at hand. Signed-off-by: Thomas Kosiewski <tk@coder.com>

…moryBrowser Extracts the list+editor into src/browser/features/Memory/ (MemoryBrowser + MemoryFileEditor) parameterized by workspaceId | null and a scope filter so Settings can reuse them. The list now renders collapsible scope sections with counts, bordered rows with a FileText icon, muted dir prefixes for nested paths, counter-nums usage stats, a filled accent pin indicator, an accent 'agent edited' chip, and hover-revealed RowActionButtons. Delete confirms through ConfirmationModal instead of window.confirm; the editor header gets a RowActionButton back affordance and a proper Save button with saving state. Adds MemoryTab stories and memory route mocks for Storybook. Signed-off-by: Thomas Kosiewski <tk@coder.com>

New Settings → Memory section (experiment-gated like Governor) consumes the shared MemoryBrowser with workspaceId null + global scope filter, riding the workspace-independent memory routes. Includes section stories and gating/redirect tests. Signed-off-by: Thomas Kosiewski <tk@coder.com>

… tree Within each scope section, files now group by their scope-relative path segments into a collapsible tree: dir rows (chevron + folder icon + recursive file count, default expanded), file leaves showing just the basename with indent guides. Dirs sort before files, each alphabetically. File rows keep description, usage stats, pin state, agent-edited badge, and hover actions, but drop the card border for a flat hover-highlight look. aria-labels keep the full scope-relative name for uniqueness. Signed-off-by: Thomas Kosiewski <tk@coder.com>

ThomasK33 · 2026-06-11T21:41:10Z

@codex review

mintlify · 2026-06-11T21:43:19Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
Mux	🟢 Ready	View Preview	Jun 11, 2026, 9:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 798f6ddff7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…sub-project workspaces Codex review: config.cwd includes the sub-project segment on sub-project workspaces, splitting agent-created project memories from the Memory tab, index, and hot-set (which resolve the checkout root via resolveWorkspaceRootPath). Pass the checkout root into ToolConfiguration and prefer it in the memory tool's scope context.

ThomasK33 · 2026-06-11T21:48:53Z

@codex review

Addressed: project memory now anchors at the workspace checkout root (resolveWorkspaceRootPath) instead of the sub-project execution cwd — added workspaceCheckoutRootPath to ToolConfiguration, preferred in the memory tool's scope context, with a regression test (memory.test.ts: "resolves project memory from the checkout root, not the execution cwd"). 7b1f891

…hor exists Codex review round 7: multi-project workspaces execute in a shared container dir that is not a git repository; anchoring /memories/project there produced untracked files that die with the container. New shared resolveMemoryProjectAnchor (single source of truth for the tool config, index builder, hot-set, and oRPC routes) returns the single project checkout root, or null for multi-project workspaces and unresolvable roots — project commands then return a recoverable 'unavailable' error while global/workspace scopes keep working.

ThomasK33 · 2026-06-11T22:56:16Z

@codex review

Addressed: added resolveMemoryProjectAnchor(metadata, runtime) — the single resolution authority used by the memory tool config, <memory_index> builder, hot-set preloader, and memory.* oRPC routes. It returns the single-project checkout root, and null for multi-project workspaces (shared container is not a git repo) or unresolvable persisted roots; project-scope commands then fail with a recoverable "unavailable" error while global/workspace scopes keep working. Tests: "rejects project commands when no single checkout anchor exists" + "resolves the anchor to null for multi-project workspaces".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 328e8886e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex review round 8: committed trees bypass the write-time MEMORY_MAX_FILES_PER_SCOPE cap, so a degenerate repo could force thousands of per-file prefix reads (each a remote command over SSH) during index/hot-set/tab builds. Enumeration now honors the cap: local walks stop early, the remote find pipes through head, and the index truncates (with a debug log) before any per-file reads.

ThomasK33 · 2026-06-11T23:02:38Z

@codex review

Addressed: enumeration now honors MEMORY_MAX_FILES_PER_SCOPE before any per-file reads — the local walk stops at the cap (+1 for overflow detection), the remote find pipes through head -n cap+1 (bounding output transfer in the same single command), and listIndexEntries truncates with a debug log. Regression test: "caps indexed files per scope to the declared limit" (cap+25 committed files → exactly cap entries).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39201a1ddd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ped subsets Codex review round 9: the local walk's cap check ran only before readdir, so one flat directory could still accumulate every name before index truncation. The cap now applies per entry, directories iterate in sorted order, and the remote find pipes through sort before head — both stores keep the lexicographically-first N files when a committed tree exceeds the cap.

ThomasK33 · 2026-06-11T23:07:45Z

@codex review

Both addressed:

Per-entry cap (P2): the local walk now checks the cap on every entry (a single flat directory can exceed it alone) and iterates each directory in sorted order, so the kept subset is bounded at cap+1 and deterministic.
Deterministic remote subset (P3): the remote listing pipes find through sort before head -n cap+1, keeping transfer bounded while matching the local lexicographic-first-N behavior.
Test extended: the cap test now asserts the kept subset is exactly f0000.md … f0999.md.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f6d9c47cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ested trees Sort directory entries with a path-string key (directories key as 'name/') so the local DFS emits exact global lexicographic path order, matching the remote 'find | sort | head' capped subset. A root file like 'a.md' now survives the cap even when a sibling directory alone exceeds it.

ThomasK33 · 2026-06-11T23:17:09Z

@codex review

Addressed the P2 (global lexicographic order under the enumeration cap) in 0cf6b5c: the local walk now sorts entries with a path-string key (directories key as name/), so the DFS emits exact global lexicographic path order and the capped subset matches the remote find | sort | head path. Regression test covers the exact scenario from the finding (a.md root file surviving the cap when sibling dir a/ alone exceeds it).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cf6b5cecd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Read paths (index/hot-set enumeration on stream startup, Memory tab list, view) previously ensureRoot()'d each scope, leaving an untracked .mux/ directory in clean checkouts before any memory was written. Split the symlink-safety gate (assertRootSafe) from root creation (ensureRoot): reads assert safety and treat missing roots as empty, only file-creating writes (create, UI save) materialize the root. - LocalMemoryStore.assertContained tolerates a missing root (nothing under a nonexistent root exists, so containment is trivial). - RuntimeMemoryStore commands run from cwd / (the old default cwd was the root itself, which may now legitimately be missing); listFiles guards root existence explicitly so a missing root lists as empty. - view of a scope root with no files reads as an empty directory instead of an error: the scope always exists in the protocol.

ThomasK33 · 2026-06-11T23:34:17Z

@codex review

Addressed the read-only root creation finding in ecda4fb: split the symlink-safety gate (assertRootSafe) from root creation (ensureRoot). Read paths (index/hot-set enumeration, Memory tab list, view) now assert safety only and treat missing roots as empty; only file-creating writes (create, UI save) materialize <checkout>/.mux/memory. Remote store commands now run from cwd / with an explicit existence guard on enumeration so a missing root lists as empty over SSH too. Regression tests assert a clean checkout stays untouched (no .mux/) across enumeration, virtual-root view, scope-root view, and missing-file reads — on both local and remote runtimes; verified against the real sshd integration fixture.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecda4fb7a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…a committed names/descriptions parseMemoryPath now rejects '<', '>' and '"' in path segments: nested committed names could reassemble structure-breaking markup once joined with '/' ('a<' + 'memory_index>pwn.md' renders as 'a</memory_index>pwn.md' in the <memory_index> block). Windows forbids these in filenames anyway, so the restriction also keeps git-tracked project memories checkout-able cross-platform. Index enumeration already re-validates committed names, so hostile files are skipped rather than rendered. Frontmatter descriptions (repo-controlled, display-only) are now escaped at the index render sink with the same XML-escaping helper the hot-set renderer uses, so they cannot close the block or its quotes.

ThomasK33 · 2026-06-11T23:43:23Z

@codex review

Addressed the memory-index prompt-block breakout finding in b4fb1ed:

parseMemoryPath now rejects <, > and " in path segments — nested committed names can reassemble block-closing markup across segments once joined with / (a< + memory_index>pwn.md → a</memory_index>pwn.md), so per-segment rejection is required. Windows forbids these in filenames anyway, which also keeps git-tracked project memories checkout-able cross-platform. Index enumeration re-validates committed names, so hostile files are skipped rather than rendered (regression test asserts exactly one </memory_index> delimiter in the rendered block).
Frontmatter descriptions (display-only) are escaped at the index render sink with the same escapeXmlAttribute helper the hot-set renderer uses, so they cannot close the block or its quotes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4fb1edd5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…hange badges - formatHotMemoriesBlock: repo-controlled content could contain '</memory_file></hot_memories>' to smuggle text outside the untrusted-data wrapper. Neutralize the exact closing sequences (full XML-escaping would mangle code-heavy notes); the replacement never reintroduces '</'. - MemoryBrowser: workspace-scope memories are per-workspace state, but the change listener filtered by scope only, so another workspace's agent edit badged the same virtual path here. Ignore workspace-scope events from other workspaces; global/project scopes stay live (shared).

ThomasK33 · 2026-06-11T23:52:05Z

@codex review

Both findings addressed in 1e8086a:

Hot-memory content delimiters (P1): formatHotMemoriesBlock now neutralizes the exact block-closing sequences (</memory_file>, </hot_memories>) in preloaded content (</ → </), so repo-controlled bytes can no longer appear outside the untrusted-data wrapper. Targeted neutralization rather than full XML-escaping keeps code-heavy notes legible; the replacement never reintroduces </, so one pass suffices. Regression test asserts exactly one closing delimiter of each kind in the rendered block.
Workspace change badges (P2): MemoryBrowser now ignores workspace-scope events whose event.workspaceId differs from the bound workspace, so another workspace's /memories/workspace/... edit no longer badges the same virtual path here. Global/project events stay live as intended. Test covers both directions (foreign event → no badge; own event → badge).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e8086ac69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

MemoryChangeEvent now carries the emitting scope context's stable projectPath. The memory.onChange subscription (already workspace-bound) filters server-side: workspace-scope events from other workspaces and project-scope events from other projects are dropped — the same virtual path elsewhere is a physically different file, so it must not refresh or badge this subscriber's list. Global events stay shared. The R12 UI-side workspace filter moved into the router with the rest (single authoritative place; any client inherits correct behavior).

ThomasK33 · 2026-06-12T00:03:51Z

@codex review

Addressed in ed6d437: MemoryChangeEvent now carries the emitter's stable projectPath, and the memory.onChange subscription filters server-side — workspace-scope events from other workspaces and project-scope events from other projects are dropped before reaching any client (the same virtual path elsewhere is a physically different file). Global events stay shared. The previous UI-side workspace filter moved into the router along with the new project filter so there is a single authoritative filtering point. Router test covers all five lanes: foreign-workspace workspace event (dropped), foreign-project project event (dropped), global from anywhere (delivered), own workspace event (delivered), same-project event from a sibling workspace (delivered).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed6d43782d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…et content '</memory_file >' or '</hot_memories\n>' survived the exact replaceAll while a model may read them as equivalent closers; match optional whitespace before '>' (and case variants) with one regex, preserving the matched text in the neutralized output.

ThomasK33 · 2026-06-12T00:11:05Z

@codex review

Addressed in the latest commit: neutralizeMemoryContent now matches closing tags with optional whitespace before > (and case variants) via /<\/(memory_file|hot_memories)(\s*)>/gi, preserving the matched name/whitespace in the neutralized </...> output. The replacement never reintroduces </, so one pass still suffices. Regression test covers </hot_memories >, </hot_memories\n>, and </MEMORY_FILE> and asserts exactly one structural closer of each kind (any spelling) remains in the block.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ee635357e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…Ready AgentSession computed and segment-cached the block before streamMessage ran its ensureReady startup path; on a stopped Docker/remote workspace, project-scope listing failed silently and the empty/partial block stayed cached for the whole segment. streamMessage now takes a resolveHotMemoriesBlock callback invoked right after ensureReady succeeds — the session still owns the per-segment cache (and its compaction-boundary reset still happens before the stream), and runtime-start UX stays on streamMessage's statusSink path.

ThomasK33 · 2026-06-12T00:25:03Z

@codex review

Addressed in the latest commit: streamMessage now receives a resolveHotMemoriesBlock callback instead of a pre-rendered string and invokes it immediately after runtime.ensureReady() succeeds — so project-scope listing on Docker/remote workspaces always sees a running runtime before anything is cached. AgentSession still owns the per-segment cache, the compaction-boundary reset still precedes the stream (so a just-consumed boundary recomputes for the same stream), and the container-start UX stays on streamMessage's statusSink path rather than running silently before stream startup. New regression test wraps runtimeFactory.createRuntime to record call order and asserts the resolver runs strictly after ensureReady and that the resolved block reaches the stream system context.

chatgpt-codex-connector · 2026-06-12T00:32:19Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ThomasK33 added 24 commits June 11, 2026 19:31

feat(memory): add memory experiment flag and constants (Phase 0)

ef66e6a

Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add MemoryService with three scopes, security envelope,…

5ab5829

… and change events Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add memory tool with flattened schema, alias shims, and…

fa9a826

… per-scope write policy Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): wire memory experiment through aiService, system-contex…

b644a2b

…t index injection, and coreServices Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add MemoryToolCall chat renderer with Brain icon and pe…

ee1a53e

…r-command display Signed-off-by: Thomas Kosiewski <tk@coder.com>

test(memory): use async fs in memory tests; regenerate tool docs and …

37d2318

…built-in skill content Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add MemoryMetaService sidecar with pins keyed by logica…

f31e835

…l identity Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add UI whole-file read/save with sha256 preconditions; …

fb846f7

…index entries carry scope+relPath Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add experiment-gated memory.* oRPC routes (list/read/sa…

63146b1

…ve/delete/setPinned/onChange) Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): add experiment-gated Memory tab with scope-grouped list…

7ecbb7f

…, editor, pins, and live agent-edit badges Signed-off-by: Thomas Kosiewski <tk@coder.com>

chore(memory): fix lint in memory tab + router tests

0398473

Signed-off-by: Thomas Kosiewski <tk@coder.com>

docs: fix stale 'bun run debug ui-messages' reference (Gate G1 finding …

9a473a5

…#1) Signed-off-by: Thomas Kosiewski <tk@coder.com>

chore: regenerate built-in skill content after AGENTS.md debug-CLI do…

2f2f1c6

…c fix Signed-off-by: Thomas Kosiewski <tk@coder.com>

feat(memory): show lastAccessed/accessCount usage stats in the Memory…

9971924

… tab memory.list now returns sidecar usage stats per file; rows render 'Used N× · <relative time>' for used files only. Signed-off-by: Thomas Kosiewski <tk@coder.com>

fix(memory): update SSH integration test for Phase 3 MemoryService co…

c755b68

…nstructor and projectPath

mintlify Bot deployed to staging - docs June 11, 2026 21:43 View deployment

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/tools/memory.ts Outdated

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryService.ts

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryService.ts

Comment thread src/node/services/memoryService.ts Outdated

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryService.ts

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryService.ts

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryService.ts

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/node/services/memoryHotSet.ts Outdated

Comment thread src/browser/features/Memory/MemoryBrowser.tsx

Conversation

ThomasK33 commented Jun 11, 2026

Summary

Background

Implementation

Validation

Risks

Pains

Agent Memory in Mux — Implementation Plan

Locked decisions

Architecture

Scopes and physical mapping

Tool surface (model-facing)

File format

Security envelope (enforced once, in MemoryService)

Concurrency & conflicts

Experiment gating (memory)

Mode / sub-agent write policy (command-level, enforced in the handler)

Phases

Phase 0 — Experiment flag + constants (~40 net LoC)

Phase 1 — MemoryService + memory tool (agent-facing MVP) (~1,200 net LoC)

Phase 2 — Curation UI (Memory tab) (~750 net LoC)

Phase 3 — Usage stats, hot set, preloading (~400 net LoC)

Phase 4 — Later (not in this change set)

Risks / notes

Acceptance criteria

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

mintlify Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Experiment gating (`memory`)

Phase 1 — MemoryService + `memory` tool (agent-facing MVP) (~1,200 net LoC)

mintlify Bot commented Jun 11, 2026 •

edited

Loading