Skip to content

🤖 feat: agent memory — six-command memory tool, three scopes, curation UI, hot-set preloading (experiment)#3526

Open
ThomasK33 wants to merge 41 commits into
mainfrom
agent-memory-6n9v
Open

🤖 feat: agent memory — six-command memory tool, three scopes, curation UI, hot-set preloading (experiment)#3526
ThomasK33 wants to merge 41 commits into
mainfrom
agent-memory-6n9v

Conversation

@ThomasK33

Copy link
Copy Markdown
Member

Summary

Adds agent memory to Mux behind a new memory experiment (off by default): a provider-agnostic memory tool implementing Anthropic's six-command protocol (view/create/str_replace/insert/delete/rename), backed by a main-process MemoryService with three scopes (global / project / workspace), a tree-styled curation UI (right-sidebar Memory tab + Settings → Memory for global files), and usage-tracked hot-memory preloading with prompt-cache-stable injection.

Background

Agents forget durable user preferences and project facts between workspaces. This PR gives them a filesystem-like memory surface they can read/write via tool calls, plus automatic context injection so frequently-used/pinned memories are available without tool calls. Design follows Anthropic's memory tool semantics so models prompted for that protocol work out of the box, while staying provider-agnostic (flattened nullish schema; verified zero invalid tool calls across Anthropic/OpenAI/Google in dogfooding).

Implementation

  • Scopes (models only ever see virtual /memories/... paths):
    • /memories/global/…<muxHome>/memory/ (host-local, shared across projects)
    • /memories/project/…<checkout>/.mux/memory/ via the Runtime abstraction (git-tracked, works over SSH; remote writes via temp+mv)
    • /memories/workspace/… → session dir (dies with the workspace)
  • Security envelope enforced once in MemoryService: pre-resolution traversal rejection, realpath parent-walk symlink containment (local + shell-quoted remote), atomic writes, 100KB/file + 1000 files/scope caps, self-healing loads, plain-text-only rendering (memory content is attacker-influenceable).
  • Access policy per agent class × scope: exec rw everywhere; plan-like read-only on project scope (tracked tree); explore view-only.
  • Context tiers: static tool description → per-request memory index (path + frontmatter description, system-message tail) → hot set (pinned + top-K by decayed usage, 16KB/item / 48KB total, recomputed only at session start/compaction boundaries for cache stability) → cold via tool calls.
  • Pins + usage stats live in a host-local sidecar (memory-meta.json) keyed by logical identity — never in git-tracked files (a cloned repo must not be able to force itself into the hot tier).
  • UI: shared MemoryBrowser/MemoryFileEditor render scope sections with a native directory tree (chevrons, indent guides, recursive counts), descriptions, usage stats, pin/delete row actions, agent-edited badges via memory.onChange, and sha256-conflict-safe saves. Settings → Memory manages global files workspace-independently (workspaceId is nullish on memory.* routes). Server-side experiment gating on every route.

Validation

  • Three blocking dogfood gates with screenshot/recording/raw-request evidence:
    • G1: cross-workspace remember→recall unprompted across 3 providers (11 calls, 0 invalid); index injection verified in raw provider requests; bounded re-cache on index change (tool/schema prefix stays cached); SSH project-scope round-trip (jest integration test against the sshd fixture).
    • G2: UI edit visible to a running agent's next view; conflicting save surfaces a conflict banner with no silent clobber; pin survives reload.
    • G3: pinned memory canary answered from context with zero tool calls (proven in raw request bytes); hot set byte-identical across turns (sha256 match, 18.8k cache-read); index ≈434 chars + hot set ≈578 chars measured.
  • ~150 unit/integration tests across memoryService (security matrix, concurrency, remote runtime), tool dispatch (18 mode-matrix cells), router (experiment-off rejection, sha preconditions), sidecar self-healing, hot-set selection/caching, and UI flows.
  • make static-check green.

Risks

  • Off-by-default experiment: with it off there is no tool, no index, no tab, and routes reject — zero behavior change for existing users.
  • Highest-regression-risk touchpoints are aiService/streamContextBuilder (system-message assembly) and agentSession (hot-block caching); both are additive and experiment-gated, with tests asserting byte-stable output and zero context change when disabled.
  • Known limitations documented in code: no cross-process file locking (single main process assumption); mid-session pins take effect at the next session segment by design.

Pains

  • dev-server-sandbox default seeding resumed parent-instance tasks and deleted a live agent worktree during gate G3 (recovered from committed state; --clean-projects avoids it) — worth a separate fix.
  • Seeded gateway credentials 403 in sandboxes; gates had to wire providers from env keys.

📋 Implementation Plan

Agent Memory in Mux — Implementation Plan

Adopt Anthropic's six-command memory protocol (view, create, str_replace, insert, delete, rename) as a provider-agnostic Mux tool, backed by a main-process MemoryService with three scopes (global / project / workspace), a curation UI in the right sidebar, and usage-tracked hot-memory preloading. Everything is gated behind a new memory experiment.

Grounded in: deep-research findings (verified against Anthropic SDK/docs, Vercel AI SDK, MCP memory server) + four repo investigations (experiments system, runtime abstraction, tool pipeline, service/IPC/UI patterns).

Locked decisions

Decision Choice
create on existing file Error (current Anthropic spec). Overwrite = delete + create. Documented in tool description
Project memories in git Committed by default (reviewable in PRs via normal Review pane flow)
Scope rollout All three scopes at once (global, project, workspace)
Runtimes Must work on all runtimes (local, worktree, SSH) — project scope goes through the Runtime abstraction
Gating memory experiment (EXPERIMENT_IDS.MEMORY), off by default, user-overridable, shown in Settings

Architecture

flowchart TD
    Agent["Agent tool call<br/>memory (6 commands)"] --> Norm["Zod preprocess + flattened<br/>nullish schema"]
    UI["Memory tab (right sidebar)<br/>view / edit / pin / delete"] -->|oRPC| Svc
    Norm --> Svc["MemoryService (main process)<br/>path validation · per-root mutex ·<br/>atomic writes · stats · events"]
    Svc -->|"local fs"| G["~/.mux/memory/ (global)"]
    Svc -->|"runtime.readFile/writeFile/exec<br/>(works over SSH)"| P["&lt;checkout&gt;/.mux/memory/<br/>(project, git-tracked)"]
    Svc -->|"local fs"| W["~/.mux/sessions/&lt;ws&gt;/memory/<br/>(workspace, ephemeral)"]
    Svc -.->|"EventEmitter → oRPC eventIterator"| UI
    Svc -.->|"index → late context block<br/>hot set → context injection (P3)"| Ctx["Model context"]
Loading

Scopes and physical mapping

Models only ever see virtual paths; MemoryService maps them. Physical paths never leak into context.

Virtual path Physical location Access Lifecycle
/memories/global/… <muxHome>/memory/ on the host (getMuxHome() from src/common/constants/paths.ts) local fs Permanent; shared across all projects
/memories/project/… <workspace checkout>/.mux/memory/ Runtime (runtime.readFile/stat/ensureDir + exec for listing/realpath); writes: write-file-atomic on local runtimes, SSHRuntime.writeFile (temp+mv) on remote Git-tracked; travels with worktrees; merges via normal git flow
/memories/workspace/… config.getSessionDir(workspaceId)/memory/ on the host local fs Deleted with the workspace

Cross-runtime notes (verified):

  • Tool handlers run in the main process and already receive config.runtime + config.cwd; readFileString/writeFileString (src/node/utils/runtime/helpers.ts) work transparently over SSH (SSHRuntime.writeFile is already atomic: cat > tmp && mv).
  • Session dirs are always host-local, even for SSH workspaces → workspace scope is uniform everywhere.
  • Global scope is host-local by definition (the main process runs on the host); precedent: global skills always resolve through a local runtime (resolveGlobalRuntime in agent skills).
  • Remote listing mirrors skills discovery: RemoteRuntimeexecBuffered(runtime, "find …") (see listSkillDirectoriesFromRuntime in src/node/services/agentSkills/agentSkillsService.ts).

Tool surface (model-facing)

One memory tool. Flattened object schema (not a discriminated union — strict-mode providers flatten unions poorly): command enum + all op-specific fields .nullish(), dispatch in the handler with != null checks. This follows the repo's documented conventions in src/common/utils/tools/toolDefinitions.ts.

command: "view" | "create" | "str_replace" | "insert" | "delete" | "rename"
path, file_text, old_str, new_str, insert_line, insert_text, old_path, new_path  → all .nullish()
offset, limit  → .nullish() numbers (replaces Anthropic's view_range tuple; mirrors file_read)
  • Zod preprocessor shims (same mechanism as bash command→script): file_path/filePath→path, content→file_text, old_string→old_str, new_string→new_str.
  • Reference semantics copied from the Anthropic SDK: view on a directory lists ≤2 levels deep, excludes dotfiles; str_replace requires unique old_str and returns matching line numbers on ambiguity (recoverable tool error); insert at line N; rename for move; delete for file/dir.
  • create errors on existing files (locked decision).
  • Static tool description; dynamic index in a late context block. The tool description carries only the protocol ("check relevant memories before acting; record durable facts/preferences") + command semantics, so the cached tool/schema prefix stays stable. The per-request memory index (virtual path + one-line description per file, plus a pinned marker once P2 sidecar pins exist) is injected at stream-context assembly (buildStreamSystemContext in src/node/services/streamContextBuilder.ts) as a late context block — preferred placement: transient system-reminder-style block near the tail of the message list (cache-optimal; the tail changes every turn anyway), falling back to an end-of-system-message section if no tail-injection mechanism fits cleanly. This deliberately diverges from the skills index (which lives in agent_skill_read's description): skills change rarely, memory files change mid-session, and tool-description churn would invalidate the cached tool prefix. G1 measures cache hit rates to validate the placement.

File format

Markdown with optional YAML frontmatter carrying display metadata only (description, used for the index — same repo trust level as the existing skills index). Pins and usage stats never live in the files: a committed pinned: true in a cloned repo would force attacker-chosen content into the hot context tier, and stats in git-tracked files create commit noise/merge conflicts. Both live in a host-local, service-owned sidecar; pinning is a user/UI action only.

Security envelope (enforced once, in MemoryService)

Verified against Anthropic's reference implementations + existing skills containment checks:

  • Reject absolute paths, ~, .. segments, URL-encoded traversal before resolution; then resolve and enforce prefix containment under the scope root.
  • Symlink escape prevention: local scopes via fs.realpath walk (parent-walking when components don't exist — Python SDK _validate_no_symlink_escape pattern); project scope on remote runtimes via runtime.exec with every path shell-quoted, the root ensureDir'd first, then a remote realpath parent-walk containment check before any mutation or listing.
  • Atomic writes: all local-disk writes (global, workspace, and project scope on local/worktree runtimes) go through service-level write-file-atomic (already a repo dependency) — never raw streams; project scope on remote runtimes uses SSHRuntime.writeFile (already temp+mv).
  • Index hardening: frontmatter description is repo-controlled — the index renders it single-line, truncated (~200 chars), control characters stripped, quoted as data; the index block notes that project memory metadata/content is untrusted input until the user/agent chooses to rely on it.
  • Caps: 100KB/file, 1,000 files/scope (constants in src/common/constants/); str_replace/insert only on UTF-8 text.
  • Self-healing: malformed frontmatter/stats files are skipped/sanitized at load — never brick a workspace (crash-resilience doctrine).
  • Renderer: memory content is attacker-influenceable (project memories arrive via cloned repos) — render as plain text/React trees, no innerHTML-family sinks; no SECURITY-AUDIT-worthy sinks added.

Concurrency & conflicts

  • All writes (agent tool + UI) funnel through MemoryServiceMutexMap keyed by physical root (src/node/utils/concurrency/mutexMap.ts) eliminates intra-process races. No filesystem locking in v1 (single main process; the dev-server sandbox uses its own MUX_ROOT, so cross-process collisions on the same global root are out of scope — documented limitation).
  • Agent str_replace is naturally optimistic (old_str must match → recoverable tool error on conflict).
  • UI saves carry a contentSha256 captured at load; service rejects on mismatch (re-read & retry prompt) — the verified optimistic-concurrency pattern.
  • Project scope across parallel agents rides worktree isolation + git merge (no runtime contention by construction); only global scope truly contends, and it's serialized by the mutex.

Experiment gating (memory)

  • Define MEMORY: "memory" in EXPERIMENT_IDS + registry entry in EXPERIMENTS (src/common/constants/experiments.ts): enabledByDefault: false, userOverridable: true, showInSettings: true → appears automatically in Settings → Experiments.
  • Backend: aiService.ts resolves experimentsService.isExperimentEnabled(EXPERIMENT_IDS.MEMORY) (with client override passthrough, same as DYNAMIC_WORKFLOWS) → passed into getToolsForModel (src/common/utils/tools/tools.ts) to conditionally register the memory tool. Experiment off ⇒ no tool, no index, zero context cost.
  • Frontend: Memory tab gated via featureFlag: EXPERIMENT_IDS.MEMORY in TAB_CONFIG_DEF (src/browser/features/RightSidebar/Tabs/tabConfig.ts); components use useExperimentValue(EXPERIMENT_IDS.MEMORY).
  • Server-side enforcement: the oRPC memory.* routes themselves check isExperimentEnabled and reject when disabled — UI hiding alone is not the gate.
  • Tests: backend spyOn(experimentsService, "isExperimentEnabled"); frontend spyOn(ExperimentsModule, "useExperimentValue") (established patterns).

Mode / sub-agent write policy (command-level, enforced in the handler)

The memory tool handler receives a memoryAccess policy via ToolConfiguration (alongside the existing planFileOnly plumbing used by validatePlanModeAccess in src/node/services/tools/fileCommon.ts) and enforces it per command + scope — not via regex tool policy, which can only match tool names:

Agent context global project workspace
Exec-like read/write read/write read/write
Plan-like read/write read-only read/write
Explore/read-only read-only read-only read-only

Rationale: plan mode must not mutate the tracked source tree, and project memories are git-tracked — so project scope is read-only in plan mode, while global/workspace writes (agent-owned, untracked state) stay allowed for capturing lessons during planning. Exec-mode project writes surface in the Review pane like any diff (a feature). Tests assert mutating commands are rejected per matrix cell.


Phases

Phase 0 — Experiment flag + constants (~40 net LoC)

  • src/common/constants/experiments.ts: MEMORY id + definition.
  • src/common/constants/memory.ts (new): scope ids, virtual root (/memories), caps, char budgets.
  • Settings → Experiments row appears automatically.

Phase 1 — MemoryService + memory tool (agent-facing MVP) (~1,200 net LoC)

Files:

  • src/node/services/memoryService.ts (new, ~400): scope mapping, path validation + symlink containment (local + via-runtime), six command implementations, per-root MutexMap, caps, EventEmitter change events {scope, path, actor, workspaceId}. Constructor-injected via ServiceContainer/coreServices (existing DI pattern).
  • src/node/services/tools/memory.ts (new, ~250): ToolFactory dispatching to MemoryService; static tool description (protocol + semantics); per-command+scope memoryAccess enforcement; recoverable error strings copied from the Anthropic SDK semantics.
  • src/node/services/streamContextBuilder.ts (~60): build + inject the per-request memory index context block (experiment-gated).
  • src/common/utils/tools/toolDefinitions.ts (~80): flattened schema + preprocessor shims.
  • src/common/utils/tools/tools.ts + src/node/services/aiService.ts (~40): registration in runtimeTools (project scope needs workspace init), experiment-gated.
  • src/browser/features/Tools/Shared/getToolComponent.ts + ToolPrimitives.tsx + src/browser/features/Tools/MemoryToolCall.tsx (new, ~150): chat renderer (per-command compact display), lucide icon mapping (no emoji).
  • Sub-agents: tool availability per existing tool policy; write permissions per the command-level memoryAccess matrix above (explore agents get view only).

Tests (not counted in LoC): traversal/symlink/atomicity/create-errors/str_replace-ambiguity unit tests against a temp dir (incl. local project create/edit atomicity); per-command handler tests; mode-matrix rejections per cell (plan-mode project mutations reject; explore mutations reject); remote containment (shell-quoted traversal + symlink-escape attempts through the runtime, SSH fixture); cross-workspace remember→recall integration test; experiment-off ⇒ tool absent.

Gate G1 (dogfood, blocking):

  1. make dev-server-sandbox (isolated MUX_ROOT, free ports — per dev-server-sandbox skill). Bootstrap browser workflows first: agent-browser skills get core.
  2. Drive the web UI with agent-browser (per agent-browser/dogfood skills): enable the memory experiment in Settings; in workspace A tell the agent "remember that I prefer X"; verify file exists under the sandbox MUX_ROOT/memory/… (bash); create workspace B; ask "what do you know about my preferences?" → agent recalls via view unprompted.
  3. Repeat the loop across ≥3 providers (Anthropic, OpenAI, Google) and record invalid-tool-call rates — empirically settles the flattened-schema question from the research.
  4. Verify index injection with bun run debug ui-messages --workspace <name>; record provider cache-read/cache-write token stats across consecutive turns (and across a mid-session memory create) to confirm the index placement preserves prompt caching.
  5. SSH runtime: integration test against SSHRuntime (localhost sshd or existing SSH test fixtures) — project-scope create/view round-trip.
  6. Evidence: annotated screenshots (agent-browser screenshot --annotate) + repro video (record start/stop) attached via attach_file.

Phase 2 — Curation UI (Memory tab) (~750 net LoC)

Files:

  • src/common/orpc/schemas/api.ts + src/node/orpc/router.ts (~180): memory.list (bulk: all scopes, one call), memory.read, memory.save (with expectedSha256), memory.delete, memory.setPinned, memory.onChange (eventIterator, async-generator bridge off the service's EventEmitter — same shape as workspace.onMetadata). All routes take workspaceId; the router resolves that workspace's metadata + active Runtime + checkout cwd through the existing WorkspaceService/runtime plumbing (same as plan-file IPC) so project scope never touches local fs paths on SSH workspaces. Routes reject when the experiment is off.
  • src/node/services/memoryMeta.ts (new, ~80): host-local sidecar store (<muxHome>/memory-meta.json) introduced here with schema {pinned} per logical key, atomic persistence (write-file-atomic), self-healing load (corrupt file ⇒ start empty + log). P3 extends the schema with usage stats.
  • tabConfig.ts + tabRegistry.tsx (~30): memory tab with featureFlag.
  • src/browser/features/RightSidebar/Memory/* (new, ~450): scope-grouped file list, Markdown view/edit, pin toggle, delete, "agent edited" badge from change events, conflict banner on 409-style save rejection. Leaf components subscribe directly (colocation rule); usePersistedState only for pure UI state (expanded groups); pins live in the service-owned host-local sidecar (introduced here for pins; P3 extends it with usage stats), not in files or localStorage. Keyboard shortcut in KEYBINDS (hidden on mobile). Conditional rendering (no Radix portals) for happy-dom testability.

Tests: UI tests for list/edit/conflict flows; router tests for sha precondition and experiment-off ⇒ memory.* routes reject; sidecar self-healing (corrupt meta file ⇒ empty, no crash).

Gate G2 (dogfood, blocking): agent-browser session — edit a memory in the panel while an agent task is running → agent's next view sees the edit; force a conflicting save (agent writes between UI load and UI save) → conflict surfaces, no silent clobber; pin a file → badge/state survives reload. Screenshots + recording attached.

Phase 3 — Usage stats, hot set, preloading (~400 net LoC)

  • Sidecar outside the memory roots (host-local, e.g. <muxHome>/memory-meta.json, atomic writes): {lastAccessedAt, accessCount, lastWriteAt, pinned} keyed by logical identity, not physical path — global:<relpath>, project:<projectId>:<relpath>, workspace:<workspaceId>:<relpath>. projectId = the existing stable project identity from Mux config where available, else a hash of {runtimeKind, remote host identity, normalized project root}never the physical worktree path, so N worktrees of one repo share heat/pins, branch-agnostic by relpath. Recorded at the MemoryService chokepoint; never git-tracked.
  • Three context tiers: index (always, late context block — P1) → hot set (user-pinned via sidecar + top-K by recency/frequency under budgets, e.g. 16KB/item, 48KB total in src/common/constants/memory.ts) → cold (tool call). Auto-hot selection is gated on actual local usage (sidecar stats), which a cloned repo cannot fake.
  • Hot-set injection mirrors the post-compaction loaded-skill snapshot mechanism (loadedSkillSnapshots.ts / compactionHandler.ts / attachments.ts budgets): recomputed only at session start and compaction boundaries — never per turn — to preserve provider prompt caching.
  • Heat-metric guards: writes/pins count as uses; age decay; known limitation (self-reinforcing hot set) documented; demotion-sampling deferred to P4.
  • Memory tab shows lastAccessed/accessCount.

Gate G3 (dogfood, blocking): pinned memory provably in context without a tool call (bun run debug ui-messages); hot set byte-identical across consecutive turns within a session (cache stability); measured index + hot-set char/token cost reported with screenshots.

Phase 4 — Later (not in this change set)

Pre-compaction "flush working context to memory" warning (compactionHandler hook); demotion-sampling experiment; cross-process locking if the global root is ever shared between processes; memory sync/import-export.

Total v1 (P0–P3): ~2,400 net LoC product code (range 2,300–2,900; P1 carries the plumbing risk).


Risks / notes

  • Strict-mode schema behavior is the biggest unknown → retired empirically at G1 step 3 before any UI work.
  • Index placement vs. caching: if no clean tail-injection mechanism exists, the end-of-system-message fallback invalidates message-history cache on mid-session index changes; G1 cache telemetry decides whether more work is needed before P2.
  • Prompt-injection: index lines come from repo-controlled frontmatter — same trust level as the existing skills index; content stays behind explicit tool calls or budget-capped hot tier.
  • .mux/memory + dotfile exclusion: the project root maps inside .mux/memory/, so the view dotfile-exclusion rule applies to its contents, not the root itself.
  • Upgrade/downgrade safe: experiment off by default; memory dirs are plain files; no migrations.

Acceptance criteria

  1. With the experiment off: no memory tool in any request, no tab, the UI makes no memory calls, and the server rejects direct memory.* oRPC calls; zero context-size change.
  2. With it on: agent can view/create/str_replace/insert/delete/rename in all three scopes on local and SSH workspaces; create errors on existing files; traversal/symlink escapes rejected on both local and remote, and the mode/sub-agent access matrix is enforced (tests prove each cell).
  3. Memories persist across workspaces (global), travel with branches/PRs (project), die with the workspace (workspace).
  4. Users can view/edit/pin/delete every memory in the Memory tab with live agent-edit updates and conflict-safe saves.
  5. Pinned/hot memories appear in context without tool calls, within fixed budgets, cache-stable within a session.
  6. make static-check + targeted tests green; all three dogfood gates passed with attached evidence.

Generated with mux • Model: anthropic:claude-fable-5 • Thinking: max • Cost: $305.22

ThomasK33 added 24 commits June 11, 2026 19:31
Signed-off-by: Thomas Kosiewski <tk@coder.com>
… and change events

Signed-off-by: Thomas Kosiewski <tk@coder.com>
… per-scope write policy

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…t index injection, and coreServices

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…r-command display

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…built-in skill content

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…l identity

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…index entries carry scope+relPath

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ve/delete/setPinned/onChange)

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…, editor, pins, and live agent-edit badges

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…c fix

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…sedAt, lastWriteAt)

Pins now count as uses; unpinning preserves stats. Adds renameKeys/removeKeys
subtree maintenance and self-healing sanitization for malformed stats fields.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ogical keys

Every successful agent command (view/create/str_replace/insert/rename) and UI
read/save records a use in the sidecar; deletes and renames maintain sidecar
entries subtree-aware. MemoryScopeContext gains the stable projectPath identity
from Mux config (never the physical worktree path).

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…usted-content block

Pinned files rank first; auto-hot files rank by half-life-decayed access
frequency. Greedy fill under 16KB/item + 48KB total budgets (constants in
src/common/constants/memory.ts); preloading bypasses usage recording.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ndaries

AgentSession caches the rendered block per session segment (undefined = not
computed, null = nothing to inject) and invalidates it only when a pending
compaction boundary is consumed, keeping the injected bytes prompt-cache-stable.
AIService.buildHotMemoriesBlock gates on the memory experiment and self-heals
to null; streamContextBuilder appends the block after the memory index.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
… tab

memory.list now returns sidecar usage stats per file; rows render
'Used N× · <relative time>' for used files only.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…egration test

Gate G1 step 5: verifies create/view/strReplace and create-on-existing
rejection for project-scope memories through a real SSHRuntime against
the Docker sshd fixture (TEST_INTEGRATION=1 bun x jest tests/runtime/memory-ssh.test.ts).

Signed-off-by: Thomas Kosiewski <tk@coder.com>
memory.* oRPC inputs now take a nullish workspaceId. Without one, the
router resolves a stub scope context (runtime: null) so only the global
scope is reachable; project/workspace paths fail with recoverable
errors, mirroring the existing workspace-scope guard. This unblocks a
Settings-level UI for global memories that has no workspace at hand.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
…moryBrowser

Extracts the list+editor into src/browser/features/Memory/
(MemoryBrowser + MemoryFileEditor) parameterized by workspaceId | null
and a scope filter so Settings can reuse them. The list now renders
collapsible scope sections with counts, bordered rows with a FileText
icon, muted dir prefixes for nested paths, counter-nums usage stats, a
filled accent pin indicator, an accent 'agent edited' chip, and
hover-revealed RowActionButtons. Delete confirms through
ConfirmationModal instead of window.confirm; the editor header gets a
RowActionButton back affordance and a proper Save button with saving
state. Adds MemoryTab stories and memory route mocks for Storybook.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
New Settings → Memory section (experiment-gated like Governor) consumes
the shared MemoryBrowser with workspaceId null + global scope filter,
riding the workspace-independent memory routes. Includes section
stories and gating/redirect tests.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
… tree

Within each scope section, files now group by their scope-relative path
segments into a collapsible tree: dir rows (chevron + folder icon +
recursive file count, default expanded), file leaves showing just the
basename with indent guides. Dirs sort before files, each alphabetically.
File rows keep description, usage stats, pin state, agent-edited badge,
and hover actions, but drop the card border for a flat hover-highlight
look. aria-labels keep the full scope-relative name for uniqueness.

Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

@mintlify

mintlify Bot commented Jun 11, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Mux 🟢 Ready View Preview Jun 11, 2026, 9:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 798f6ddff7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/tools/memory.ts Outdated
…sub-project workspaces

Codex review: config.cwd includes the sub-project segment on sub-project
workspaces, splitting agent-created project memories from the Memory tab,
index, and hot-set (which resolve the checkout root via
resolveWorkspaceRootPath). Pass the checkout root into ToolConfiguration
and prefer it in the memory tool's scope context.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed: project memory now anchors at the workspace checkout root (resolveWorkspaceRootPath) instead of the sub-project execution cwd — added workspaceCheckoutRootPath to ToolConfiguration, preferred in the memory tool's scope context, with a regression test (memory.test.ts: "resolves project memory from the checkout root, not the execution cwd"). 7b1f891

…hor exists

Codex review round 7: multi-project workspaces execute in a shared
container dir that is not a git repository; anchoring /memories/project
there produced untracked files that die with the container. New shared
resolveMemoryProjectAnchor (single source of truth for the tool config,
index builder, hot-set, and oRPC routes) returns the single project
checkout root, or null for multi-project workspaces and unresolvable
roots — project commands then return a recoverable 'unavailable' error
while global/workspace scopes keep working.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed: added resolveMemoryProjectAnchor(metadata, runtime) — the single resolution authority used by the memory tool config, <memory_index> builder, hot-set preloader, and memory.* oRPC routes. It returns the single-project checkout root, and null for multi-project workspaces (shared container is not a git repo) or unresolvable persisted roots; project-scope commands then fail with a recoverable "unavailable" error while global/workspace scopes keep working. Tests: "rejects project commands when no single checkout anchor exists" + "resolves the anchor to null for multi-project workspaces".

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 328e8886e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryService.ts
Codex review round 8: committed trees bypass the write-time
MEMORY_MAX_FILES_PER_SCOPE cap, so a degenerate repo could force
thousands of per-file prefix reads (each a remote command over SSH)
during index/hot-set/tab builds. Enumeration now honors the cap: local
walks stop early, the remote find pipes through head, and the index
truncates (with a debug log) before any per-file reads.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed: enumeration now honors MEMORY_MAX_FILES_PER_SCOPE before any per-file reads — the local walk stops at the cap (+1 for overflow detection), the remote find pipes through head -n cap+1 (bounding output transfer in the same single command), and listIndexEntries truncates with a debug log. Regression test: "caps indexed files per scope to the declared limit" (cap+25 committed files → exactly cap entries).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39201a1ddd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryService.ts
Comment thread src/node/services/memoryService.ts Outdated
…ped subsets

Codex review round 9: the local walk's cap check ran only before readdir,
so one flat directory could still accumulate every name before index
truncation. The cap now applies per entry, directories iterate in sorted
order, and the remote find pipes through sort before head — both stores
keep the lexicographically-first N files when a committed tree exceeds
the cap.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Both addressed:

  • Per-entry cap (P2): the local walk now checks the cap on every entry (a single flat directory can exceed it alone) and iterates each directory in sorted order, so the kept subset is bounded at cap+1 and deterministic.
  • Deterministic remote subset (P3): the remote listing pipes find through sort before head -n cap+1, keeping transfer bounded while matching the local lexicographic-first-N behavior.
  • Test extended: the cap test now asserts the kept subset is exactly f0000.md … f0999.md.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f6d9c47cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryService.ts
…ested trees

Sort directory entries with a path-string key (directories key as 'name/')
so the local DFS emits exact global lexicographic path order, matching the
remote 'find | sort | head' capped subset. A root file like 'a.md' now
survives the cap even when a sibling directory alone exceeds it.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed the P2 (global lexicographic order under the enumeration cap) in 0cf6b5c: the local walk now sorts entries with a path-string key (directories key as name/), so the DFS emits exact global lexicographic path order and the capped subset matches the remote find | sort | head path. Regression test covers the exact scenario from the finding (a.md root file surviving the cap when sibling dir a/ alone exceeds it).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cf6b5cecd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryService.ts
Read paths (index/hot-set enumeration on stream startup, Memory tab list,
view) previously ensureRoot()'d each scope, leaving an untracked .mux/
directory in clean checkouts before any memory was written. Split the
symlink-safety gate (assertRootSafe) from root creation (ensureRoot): reads
assert safety and treat missing roots as empty, only file-creating writes
(create, UI save) materialize the root.

- LocalMemoryStore.assertContained tolerates a missing root (nothing under
  a nonexistent root exists, so containment is trivial).
- RuntimeMemoryStore commands run from cwd / (the old default cwd was the
  root itself, which may now legitimately be missing); listFiles guards
  root existence explicitly so a missing root lists as empty.
- view of a scope root with no files reads as an empty directory instead
  of an error: the scope always exists in the protocol.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed the read-only root creation finding in ecda4fb: split the symlink-safety gate (assertRootSafe) from root creation (ensureRoot). Read paths (index/hot-set enumeration, Memory tab list, view) now assert safety only and treat missing roots as empty; only file-creating writes (create, UI save) materialize <checkout>/.mux/memory. Remote store commands now run from cwd / with an explicit existence guard on enumeration so a missing root lists as empty over SSH too. Regression tests assert a clean checkout stays untouched (no .mux/) across enumeration, virtual-root view, scope-root view, and missing-file reads — on both local and remote runtimes; verified against the real sshd integration fixture.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecda4fb7a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryService.ts
…a committed names/descriptions

parseMemoryPath now rejects '<', '>' and '"' in path segments: nested
committed names could reassemble structure-breaking markup once joined with
'/' ('a<' + 'memory_index>pwn.md' renders as 'a</memory_index>pwn.md' in the
<memory_index> block). Windows forbids these in filenames anyway, so the
restriction also keeps git-tracked project memories checkout-able
cross-platform. Index enumeration already re-validates committed names, so
hostile files are skipped rather than rendered.

Frontmatter descriptions (repo-controlled, display-only) are now escaped at
the index render sink with the same XML-escaping helper the hot-set renderer
uses, so they cannot close the block or its quotes.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed the memory-index prompt-block breakout finding in b4fb1ed:

  • parseMemoryPath now rejects <, > and " in path segments — nested committed names can reassemble block-closing markup across segments once joined with / (a< + memory_index>pwn.mda</memory_index>pwn.md), so per-segment rejection is required. Windows forbids these in filenames anyway, which also keeps git-tracked project memories checkout-able cross-platform. Index enumeration re-validates committed names, so hostile files are skipped rather than rendered (regression test asserts exactly one </memory_index> delimiter in the rendered block).
  • Frontmatter descriptions (display-only) are escaped at the index render sink with the same escapeXmlAttribute helper the hot-set renderer uses, so they cannot close the block or its quotes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4fb1edd5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryHotSet.ts Outdated
Comment thread src/browser/features/Memory/MemoryBrowser.tsx
…hange badges

- formatHotMemoriesBlock: repo-controlled content could contain
  '</memory_file></hot_memories>' to smuggle text outside the untrusted-data
  wrapper. Neutralize the exact closing sequences (full XML-escaping would
  mangle code-heavy notes); the replacement never reintroduces '</'.
- MemoryBrowser: workspace-scope memories are per-workspace state, but the
  change listener filtered by scope only, so another workspace's agent edit
  badged the same virtual path here. Ignore workspace-scope events from
  other workspaces; global/project scopes stay live (shared).
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Both findings addressed in 1e8086a:

  • Hot-memory content delimiters (P1): formatHotMemoriesBlock now neutralizes the exact block-closing sequences (</memory_file>, </hot_memories>) in preloaded content (</&lt;/), so repo-controlled bytes can no longer appear outside the untrusted-data wrapper. Targeted neutralization rather than full XML-escaping keeps code-heavy notes legible; the replacement never reintroduces </, so one pass suffices. Regression test asserts exactly one closing delimiter of each kind in the rendered block.
  • Workspace change badges (P2): MemoryBrowser now ignores workspace-scope events whose event.workspaceId differs from the bound workspace, so another workspace's /memories/workspace/... edit no longer badges the same virtual path here. Global/project events stay live as intended. Test covers both directions (foreign event → no badge; own event → badge).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e8086ac69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/browser/features/Memory/MemoryBrowser.tsx Outdated
MemoryChangeEvent now carries the emitting scope context's stable
projectPath. The memory.onChange subscription (already workspace-bound)
filters server-side: workspace-scope events from other workspaces and
project-scope events from other projects are dropped — the same virtual
path elsewhere is a physically different file, so it must not refresh or
badge this subscriber's list. Global events stay shared. The R12 UI-side
workspace filter moved into the router with the rest (single authoritative
place; any client inherits correct behavior).
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed in ed6d437: MemoryChangeEvent now carries the emitter's stable projectPath, and the memory.onChange subscription filters server-side — workspace-scope events from other workspaces and project-scope events from other projects are dropped before reaching any client (the same virtual path elsewhere is a physically different file). Global events stay shared. The previous UI-side workspace filter moved into the router along with the new project filter so there is a single authoritative filtering point. Router test covers all five lanes: foreign-workspace workspace event (dropped), foreign-project project event (dropped), global from anywhere (delivered), own workspace event (delivered), same-project event from a sibling workspace (delivered).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed6d43782d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/memoryHotSet.ts Outdated
…et content

'</memory_file >' or '</hot_memories\n>' survived the exact replaceAll
while a model may read them as equivalent closers; match optional
whitespace before '>' (and case variants) with one regex, preserving the
matched text in the neutralized output.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed in the latest commit: neutralizeMemoryContent now matches closing tags with optional whitespace before > (and case variants) via /<\/(memory_file|hot_memories)(\s*)>/gi, preserving the matched name/whitespace in the neutralized &lt;/...> output. The replacement never reintroduces </, so one pass still suffices. Regression test covers </hot_memories >, </hot_memories\n>, and </MEMORY_FILE> and asserts exactly one structural closer of each kind (any spelling) remains in the block.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3ee635357e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/node/services/agentSession.ts Outdated
…Ready

AgentSession computed and segment-cached the block before streamMessage ran
its ensureReady startup path; on a stopped Docker/remote workspace,
project-scope listing failed silently and the empty/partial block stayed
cached for the whole segment. streamMessage now takes a
resolveHotMemoriesBlock callback invoked right after ensureReady succeeds —
the session still owns the per-segment cache (and its compaction-boundary
reset still happens before the stream), and runtime-start UX stays on
streamMessage's statusSink path.
@ThomasK33

Copy link
Copy Markdown
Member Author

@codex review

Addressed in the latest commit: streamMessage now receives a resolveHotMemoriesBlock callback instead of a pre-rendered string and invokes it immediately after runtime.ensureReady() succeeds — so project-scope listing on Docker/remote workspaces always sees a running runtime before anything is cached. AgentSession still owns the per-segment cache, the compaction-boundary reset still precedes the stream (so a just-consumed boundary recomputes for the same stream), and the container-start UX stays on streamMessage's statusSink path rather than running silently before stream startup. New regression test wraps runtimeFactory.createRuntime to record call order and asserts the resolver runs strictly after ensureReady and that the resolved block reaches the stream system context.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant