🤖 feat: agent memory — six-command memory tool, three scopes, curation UI, hot-set preloading (experiment)#3526
🤖 feat: agent memory — six-command memory tool, three scopes, curation UI, hot-set preloading (experiment)#3526ThomasK33 wants to merge 41 commits into
Conversation
Signed-off-by: Thomas Kosiewski <tk@coder.com>
… and change events Signed-off-by: Thomas Kosiewski <tk@coder.com>
… per-scope write policy Signed-off-by: Thomas Kosiewski <tk@coder.com>
…t index injection, and coreServices Signed-off-by: Thomas Kosiewski <tk@coder.com>
…r-command display Signed-off-by: Thomas Kosiewski <tk@coder.com>
…built-in skill content Signed-off-by: Thomas Kosiewski <tk@coder.com>
…l identity Signed-off-by: Thomas Kosiewski <tk@coder.com>
…index entries carry scope+relPath Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ve/delete/setPinned/onChange) Signed-off-by: Thomas Kosiewski <tk@coder.com>
…, editor, pins, and live agent-edit badges Signed-off-by: Thomas Kosiewski <tk@coder.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…#1) Signed-off-by: Thomas Kosiewski <tk@coder.com>
…c fix Signed-off-by: Thomas Kosiewski <tk@coder.com>
…sedAt, lastWriteAt) Pins now count as uses; unpinning preserves stats. Adds renameKeys/removeKeys subtree maintenance and self-healing sanitization for malformed stats fields. Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ogical keys Every successful agent command (view/create/str_replace/insert/rename) and UI read/save records a use in the sidecar; deletes and renames maintain sidecar entries subtree-aware. MemoryScopeContext gains the stable projectPath identity from Mux config (never the physical worktree path). Signed-off-by: Thomas Kosiewski <tk@coder.com>
…usted-content block Pinned files rank first; auto-hot files rank by half-life-decayed access frequency. Greedy fill under 16KB/item + 48KB total budgets (constants in src/common/constants/memory.ts); preloading bypasses usage recording. Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ndaries AgentSession caches the rendered block per session segment (undefined = not computed, null = nothing to inject) and invalidates it only when a pending compaction boundary is consumed, keeping the injected bytes prompt-cache-stable. AIService.buildHotMemoriesBlock gates on the memory experiment and self-heals to null; streamContextBuilder appends the block after the memory index. Signed-off-by: Thomas Kosiewski <tk@coder.com>
… tab memory.list now returns sidecar usage stats per file; rows render 'Used N× · <relative time>' for used files only. Signed-off-by: Thomas Kosiewski <tk@coder.com>
…egration test Gate G1 step 5: verifies create/view/strReplace and create-on-existing rejection for project-scope memories through a real SSHRuntime against the Docker sshd fixture (TEST_INTEGRATION=1 bun x jest tests/runtime/memory-ssh.test.ts). Signed-off-by: Thomas Kosiewski <tk@coder.com>
…nstructor and projectPath
memory.* oRPC inputs now take a nullish workspaceId. Without one, the router resolves a stub scope context (runtime: null) so only the global scope is reachable; project/workspace paths fail with recoverable errors, mirroring the existing workspace-scope guard. This unblocks a Settings-level UI for global memories that has no workspace at hand. Signed-off-by: Thomas Kosiewski <tk@coder.com>
…moryBrowser Extracts the list+editor into src/browser/features/Memory/ (MemoryBrowser + MemoryFileEditor) parameterized by workspaceId | null and a scope filter so Settings can reuse them. The list now renders collapsible scope sections with counts, bordered rows with a FileText icon, muted dir prefixes for nested paths, counter-nums usage stats, a filled accent pin indicator, an accent 'agent edited' chip, and hover-revealed RowActionButtons. Delete confirms through ConfirmationModal instead of window.confirm; the editor header gets a RowActionButton back affordance and a proper Save button with saving state. Adds MemoryTab stories and memory route mocks for Storybook. Signed-off-by: Thomas Kosiewski <tk@coder.com>
New Settings → Memory section (experiment-gated like Governor) consumes the shared MemoryBrowser with workspaceId null + global scope filter, riding the workspace-independent memory routes. Includes section stories and gating/redirect tests. Signed-off-by: Thomas Kosiewski <tk@coder.com>
… tree Within each scope section, files now group by their scope-relative path segments into a collapsible tree: dir rows (chevron + folder icon + recursive file count, default expanded), file leaves showing just the basename with indent guides. Dirs sort before files, each alphabetically. File rows keep description, usage stats, pin state, agent-edited badge, and hover actions, but drop the card border for a flat hover-highlight look. aria-labels keep the full scope-relative name for uniqueness. Signed-off-by: Thomas Kosiewski <tk@coder.com>
|
@codex review |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 798f6ddff7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…sub-project workspaces Codex review: config.cwd includes the sub-project segment on sub-project workspaces, splitting agent-created project memories from the Memory tab, index, and hot-set (which resolve the checkout root via resolveWorkspaceRootPath). Pass the checkout root into ToolConfiguration and prefer it in the memory tool's scope context.
|
@codex review Addressed: project memory now anchors at the workspace checkout root ( |
…hor exists Codex review round 7: multi-project workspaces execute in a shared container dir that is not a git repository; anchoring /memories/project there produced untracked files that die with the container. New shared resolveMemoryProjectAnchor (single source of truth for the tool config, index builder, hot-set, and oRPC routes) returns the single project checkout root, or null for multi-project workspaces and unresolvable roots — project commands then return a recoverable 'unavailable' error while global/workspace scopes keep working.
|
@codex review Addressed: added |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 328e8886e1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review round 8: committed trees bypass the write-time MEMORY_MAX_FILES_PER_SCOPE cap, so a degenerate repo could force thousands of per-file prefix reads (each a remote command over SSH) during index/hot-set/tab builds. Enumeration now honors the cap: local walks stop early, the remote find pipes through head, and the index truncates (with a debug log) before any per-file reads.
|
@codex review Addressed: enumeration now honors |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39201a1ddd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ped subsets Codex review round 9: the local walk's cap check ran only before readdir, so one flat directory could still accumulate every name before index truncation. The cap now applies per entry, directories iterate in sorted order, and the remote find pipes through sort before head — both stores keep the lexicographically-first N files when a committed tree exceeds the cap.
|
@codex review Both addressed:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f6d9c47cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ested trees Sort directory entries with a path-string key (directories key as 'name/') so the local DFS emits exact global lexicographic path order, matching the remote 'find | sort | head' capped subset. A root file like 'a.md' now survives the cap even when a sibling directory alone exceeds it.
|
@codex review Addressed the P2 (global lexicographic order under the enumeration cap) in 0cf6b5c: the local walk now sorts entries with a path-string key (directories key as |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0cf6b5cecd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Read paths (index/hot-set enumeration on stream startup, Memory tab list, view) previously ensureRoot()'d each scope, leaving an untracked .mux/ directory in clean checkouts before any memory was written. Split the symlink-safety gate (assertRootSafe) from root creation (ensureRoot): reads assert safety and treat missing roots as empty, only file-creating writes (create, UI save) materialize the root. - LocalMemoryStore.assertContained tolerates a missing root (nothing under a nonexistent root exists, so containment is trivial). - RuntimeMemoryStore commands run from cwd / (the old default cwd was the root itself, which may now legitimately be missing); listFiles guards root existence explicitly so a missing root lists as empty. - view of a scope root with no files reads as an empty directory instead of an error: the scope always exists in the protocol.
|
@codex review Addressed the read-only root creation finding in ecda4fb: split the symlink-safety gate ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecda4fb7a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…a committed names/descriptions
parseMemoryPath now rejects '<', '>' and '"' in path segments: nested
committed names could reassemble structure-breaking markup once joined with
'/' ('a<' + 'memory_index>pwn.md' renders as 'a</memory_index>pwn.md' in the
<memory_index> block). Windows forbids these in filenames anyway, so the
restriction also keeps git-tracked project memories checkout-able
cross-platform. Index enumeration already re-validates committed names, so
hostile files are skipped rather than rendered.
Frontmatter descriptions (repo-controlled, display-only) are now escaped at
the index render sink with the same XML-escaping helper the hot-set renderer
uses, so they cannot close the block or its quotes.
|
@codex review Addressed the memory-index prompt-block breakout finding in b4fb1ed:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4fb1edd5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…hange badges - formatHotMemoriesBlock: repo-controlled content could contain '</memory_file></hot_memories>' to smuggle text outside the untrusted-data wrapper. Neutralize the exact closing sequences (full XML-escaping would mangle code-heavy notes); the replacement never reintroduces '</'. - MemoryBrowser: workspace-scope memories are per-workspace state, but the change listener filtered by scope only, so another workspace's agent edit badged the same virtual path here. Ignore workspace-scope events from other workspaces; global/project scopes stay live (shared).
|
@codex review Both findings addressed in 1e8086a:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e8086ac69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
MemoryChangeEvent now carries the emitting scope context's stable projectPath. The memory.onChange subscription (already workspace-bound) filters server-side: workspace-scope events from other workspaces and project-scope events from other projects are dropped — the same virtual path elsewhere is a physically different file, so it must not refresh or badge this subscriber's list. Global events stay shared. The R12 UI-side workspace filter moved into the router with the rest (single authoritative place; any client inherits correct behavior).
|
@codex review Addressed in ed6d437: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed6d43782d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…et content '</memory_file >' or '</hot_memories\n>' survived the exact replaceAll while a model may read them as equivalent closers; match optional whitespace before '>' (and case variants) with one regex, preserving the matched text in the neutralized output.
|
@codex review Addressed in the latest commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ee635357e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…Ready AgentSession computed and segment-cached the block before streamMessage ran its ensureReady startup path; on a stopped Docker/remote workspace, project-scope listing failed silently and the empty/partial block stayed cached for the whole segment. streamMessage now takes a resolveHotMemoriesBlock callback invoked right after ensureReady succeeds — the session still owns the per-segment cache (and its compaction-boundary reset still happens before the stream), and runtime-start UX stays on streamMessage's statusSink path.
|
@codex review Addressed in the latest commit: |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Adds agent memory to Mux behind a new
memoryexperiment (off by default): a provider-agnosticmemorytool implementing Anthropic's six-command protocol (view/create/str_replace/insert/delete/rename), backed by a main-processMemoryServicewith three scopes (global / project / workspace), a tree-styled curation UI (right-sidebar Memory tab + Settings → Memory for global files), and usage-tracked hot-memory preloading with prompt-cache-stable injection.Background
Agents forget durable user preferences and project facts between workspaces. This PR gives them a filesystem-like memory surface they can read/write via tool calls, plus automatic context injection so frequently-used/pinned memories are available without tool calls. Design follows Anthropic's memory tool semantics so models prompted for that protocol work out of the box, while staying provider-agnostic (flattened nullish schema; verified zero invalid tool calls across Anthropic/OpenAI/Google in dogfooding).
Implementation
/memories/...paths):/memories/global/…→<muxHome>/memory/(host-local, shared across projects)/memories/project/…→<checkout>/.mux/memory/via theRuntimeabstraction (git-tracked, works over SSH; remote writes via temp+mv)/memories/workspace/…→ session dir (dies with the workspace)MemoryService: pre-resolution traversal rejection, realpath parent-walk symlink containment (local + shell-quoted remote), atomic writes, 100KB/file + 1000 files/scope caps, self-healing loads, plain-text-only rendering (memory content is attacker-influenceable).memory-meta.json) keyed by logical identity — never in git-tracked files (a cloned repo must not be able to force itself into the hot tier).MemoryBrowser/MemoryFileEditorrender scope sections with a native directory tree (chevrons, indent guides, recursive counts), descriptions, usage stats, pin/delete row actions, agent-edited badges viamemory.onChange, and sha256-conflict-safe saves. Settings → Memory manages global files workspace-independently (workspaceIdis nullish onmemory.*routes). Server-side experiment gating on every route.Validation
make static-checkgreen.Risks
aiService/streamContextBuilder(system-message assembly) andagentSession(hot-block caching); both are additive and experiment-gated, with tests asserting byte-stable output and zero context change when disabled.Pains
dev-server-sandboxdefault seeding resumed parent-instance tasks and deleted a live agent worktree during gate G3 (recovered from committed state;--clean-projectsavoids it) — worth a separate fix.📋 Implementation Plan
Agent Memory in Mux — Implementation Plan
Adopt Anthropic's six-command memory protocol (
view,create,str_replace,insert,delete,rename) as a provider-agnostic Mux tool, backed by a main-processMemoryServicewith three scopes (global / project / workspace), a curation UI in the right sidebar, and usage-tracked hot-memory preloading. Everything is gated behind a newmemoryexperiment.Grounded in: deep-research findings (verified against Anthropic SDK/docs, Vercel AI SDK, MCP memory server) + four repo investigations (experiments system, runtime abstraction, tool pipeline, service/IPC/UI patterns).
Locked decisions
createon existing filedelete+create. Documented in tool descriptionRuntimeabstractionmemoryexperiment (EXPERIMENT_IDS.MEMORY), off by default, user-overridable, shown in SettingsArchitecture
flowchart TD Agent["Agent tool call<br/>memory (6 commands)"] --> Norm["Zod preprocess + flattened<br/>nullish schema"] UI["Memory tab (right sidebar)<br/>view / edit / pin / delete"] -->|oRPC| Svc Norm --> Svc["MemoryService (main process)<br/>path validation · per-root mutex ·<br/>atomic writes · stats · events"] Svc -->|"local fs"| G["~/.mux/memory/ (global)"] Svc -->|"runtime.readFile/writeFile/exec<br/>(works over SSH)"| P["<checkout>/.mux/memory/<br/>(project, git-tracked)"] Svc -->|"local fs"| W["~/.mux/sessions/<ws>/memory/<br/>(workspace, ephemeral)"] Svc -.->|"EventEmitter → oRPC eventIterator"| UI Svc -.->|"index → late context block<br/>hot set → context injection (P3)"| Ctx["Model context"]Scopes and physical mapping
Models only ever see virtual paths;
MemoryServicemaps them. Physical paths never leak into context./memories/global/…<muxHome>/memory/on the host (getMuxHome()fromsrc/common/constants/paths.ts)/memories/project/…<workspace checkout>/.mux/memory/Runtime(runtime.readFile/stat/ensureDir+execfor listing/realpath); writes:write-file-atomicon local runtimes,SSHRuntime.writeFile(temp+mv) on remote/memories/workspace/…config.getSessionDir(workspaceId)/memory/on the hostCross-runtime notes (verified):
config.runtime+config.cwd;readFileString/writeFileString(src/node/utils/runtime/helpers.ts) work transparently over SSH (SSHRuntime.writeFileis already atomic:cat > tmp && mv).resolveGlobalRuntimein agent skills).RemoteRuntime→execBuffered(runtime, "find …")(seelistSkillDirectoriesFromRuntimeinsrc/node/services/agentSkills/agentSkillsService.ts).Tool surface (model-facing)
One
memorytool. Flattened object schema (not a discriminated union — strict-mode providers flatten unions poorly):commandenum + all op-specific fields.nullish(), dispatch in the handler with!= nullchecks. This follows the repo's documented conventions insrc/common/utils/tools/toolDefinitions.ts.command→script):file_path/filePath→path,content→file_text,old_string→old_str,new_string→new_str.viewon a directory lists ≤2 levels deep, excludes dotfiles;str_replacerequires uniqueold_strand returns matching line numbers on ambiguity (recoverable tool error);insertat line N;renamefor move;deletefor file/dir.createerrors on existing files (locked decision).descriptionper file, plus a pinned marker once P2 sidecar pins exist) is injected at stream-context assembly (buildStreamSystemContextinsrc/node/services/streamContextBuilder.ts) as a late context block — preferred placement: transient system-reminder-style block near the tail of the message list (cache-optimal; the tail changes every turn anyway), falling back to an end-of-system-message section if no tail-injection mechanism fits cleanly. This deliberately diverges from the skills index (which lives inagent_skill_read's description): skills change rarely, memory files change mid-session, and tool-description churn would invalidate the cached tool prefix. G1 measures cache hit rates to validate the placement.File format
Markdown with optional YAML frontmatter carrying display metadata only (
description, used for the index — same repo trust level as the existing skills index). Pins and usage stats never live in the files: a committedpinned: truein a cloned repo would force attacker-chosen content into the hot context tier, and stats in git-tracked files create commit noise/merge conflicts. Both live in a host-local, service-owned sidecar; pinning is a user/UI action only.Security envelope (enforced once, in MemoryService)
Verified against Anthropic's reference implementations + existing skills containment checks:
~,..segments, URL-encoded traversal before resolution; then resolve and enforce prefix containment under the scope root.fs.realpathwalk (parent-walking when components don't exist — Python SDK_validate_no_symlink_escapepattern); project scope on remote runtimes viaruntime.execwith every path shell-quoted, the rootensureDir'd first, then a remoterealpathparent-walk containment check before any mutation or listing.write-file-atomic(already a repo dependency) — never raw streams; project scope on remote runtimes usesSSHRuntime.writeFile(already temp+mv).descriptionis repo-controlled — the index renders it single-line, truncated (~200 chars), control characters stripped, quoted as data; the index block notes that project memory metadata/content is untrusted input until the user/agent chooses to rely on it.src/common/constants/);str_replace/insertonly on UTF-8 text.innerHTML-family sinks; no SECURITY-AUDIT-worthy sinks added.Concurrency & conflicts
MemoryService→MutexMapkeyed by physical root (src/node/utils/concurrency/mutexMap.ts) eliminates intra-process races. No filesystem locking in v1 (single main process; the dev-server sandbox uses its ownMUX_ROOT, so cross-process collisions on the same global root are out of scope — documented limitation).str_replaceis naturally optimistic (old_str must match → recoverable tool error on conflict).contentSha256captured at load; service rejects on mismatch (re-read & retry prompt) — the verified optimistic-concurrency pattern.Experiment gating (
memory)MEMORY: "memory"inEXPERIMENT_IDS+ registry entry inEXPERIMENTS(src/common/constants/experiments.ts):enabledByDefault: false,userOverridable: true,showInSettings: true→ appears automatically in Settings → Experiments.aiService.tsresolvesexperimentsService.isExperimentEnabled(EXPERIMENT_IDS.MEMORY)(with client override passthrough, same asDYNAMIC_WORKFLOWS) → passed intogetToolsForModel(src/common/utils/tools/tools.ts) to conditionally register the memory tool. Experiment off ⇒ no tool, no index, zero context cost.featureFlag: EXPERIMENT_IDS.MEMORYinTAB_CONFIG_DEF(src/browser/features/RightSidebar/Tabs/tabConfig.ts); components useuseExperimentValue(EXPERIMENT_IDS.MEMORY).memory.*routes themselves checkisExperimentEnabledand reject when disabled — UI hiding alone is not the gate.spyOn(experimentsService, "isExperimentEnabled"); frontendspyOn(ExperimentsModule, "useExperimentValue")(established patterns).Mode / sub-agent write policy (command-level, enforced in the handler)
The memory tool handler receives a
memoryAccesspolicy viaToolConfiguration(alongside the existingplanFileOnlyplumbing used byvalidatePlanModeAccessinsrc/node/services/tools/fileCommon.ts) and enforces it per command + scope — not via regex tool policy, which can only match tool names:Rationale: plan mode must not mutate the tracked source tree, and project memories are git-tracked — so project scope is read-only in plan mode, while global/workspace writes (agent-owned, untracked state) stay allowed for capturing lessons during planning. Exec-mode project writes surface in the Review pane like any diff (a feature). Tests assert mutating commands are rejected per matrix cell.
Phases
Phase 0 — Experiment flag + constants (~40 net LoC)
src/common/constants/experiments.ts:MEMORYid + definition.src/common/constants/memory.ts(new): scope ids, virtual root (/memories), caps, char budgets.Phase 1 — MemoryService +
memorytool (agent-facing MVP) (~1,200 net LoC)Files:
src/node/services/memoryService.ts(new, ~400): scope mapping, path validation + symlink containment (local + via-runtime), six command implementations, per-rootMutexMap, caps,EventEmitterchange events{scope, path, actor, workspaceId}. Constructor-injected viaServiceContainer/coreServices(existing DI pattern).src/node/services/tools/memory.ts(new, ~250):ToolFactorydispatching to MemoryService; static tool description (protocol + semantics); per-command+scopememoryAccessenforcement; recoverable error strings copied from the Anthropic SDK semantics.src/node/services/streamContextBuilder.ts(~60): build + inject the per-request memory index context block (experiment-gated).src/common/utils/tools/toolDefinitions.ts(~80): flattened schema + preprocessor shims.src/common/utils/tools/tools.ts+src/node/services/aiService.ts(~40): registration inruntimeTools(project scope needs workspace init), experiment-gated.src/browser/features/Tools/Shared/getToolComponent.ts+ToolPrimitives.tsx+src/browser/features/Tools/MemoryToolCall.tsx(new, ~150): chat renderer (per-command compact display), lucide icon mapping (no emoji).memoryAccessmatrix above (explore agents getviewonly).Tests (not counted in LoC): traversal/symlink/atomicity/create-errors/str_replace-ambiguity unit tests against a temp dir (incl. local project create/edit atomicity); per-command handler tests; mode-matrix rejections per cell (plan-mode project mutations reject; explore mutations reject); remote containment (shell-quoted traversal + symlink-escape attempts through the runtime, SSH fixture); cross-workspace remember→recall integration test; experiment-off ⇒ tool absent.
Gate G1 (dogfood, blocking):
make dev-server-sandbox(isolatedMUX_ROOT, free ports — per dev-server-sandbox skill). Bootstrap browser workflows first:agent-browser skills get core.agent-browser(per agent-browser/dogfood skills): enable thememoryexperiment in Settings; in workspace A tell the agent "remember that I prefer X"; verify file exists under the sandboxMUX_ROOT/memory/…(bash); create workspace B; ask "what do you know about my preferences?" → agent recalls viaviewunprompted.bun run debug ui-messages --workspace <name>; record provider cache-read/cache-write token stats across consecutive turns (and across a mid-session memorycreate) to confirm the index placement preserves prompt caching.SSHRuntime(localhost sshd or existing SSH test fixtures) — project-scope create/view round-trip.agent-browser screenshot --annotate) + repro video (record start/stop) attached viaattach_file.Phase 2 — Curation UI (Memory tab) (~750 net LoC)
Files:
src/common/orpc/schemas/api.ts+src/node/orpc/router.ts(~180):memory.list(bulk: all scopes, one call),memory.read,memory.save(withexpectedSha256),memory.delete,memory.setPinned,memory.onChange(eventIterator, async-generator bridge off the service's EventEmitter — same shape asworkspace.onMetadata). All routes takeworkspaceId; the router resolves that workspace's metadata + activeRuntime+ checkout cwd through the existing WorkspaceService/runtime plumbing (same as plan-file IPC) so project scope never touches local fs paths on SSH workspaces. Routes reject when the experiment is off.src/node/services/memoryMeta.ts(new, ~80): host-local sidecar store (<muxHome>/memory-meta.json) introduced here with schema{pinned}per logical key, atomic persistence (write-file-atomic), self-healing load (corrupt file ⇒ start empty + log). P3 extends the schema with usage stats.tabConfig.ts+tabRegistry.tsx(~30):memorytab withfeatureFlag.src/browser/features/RightSidebar/Memory/*(new, ~450): scope-grouped file list, Markdown view/edit, pin toggle, delete, "agent edited" badge from change events, conflict banner on 409-style save rejection. Leaf components subscribe directly (colocation rule);usePersistedStateonly for pure UI state (expanded groups); pins live in the service-owned host-local sidecar (introduced here for pins; P3 extends it with usage stats), not in files or localStorage. Keyboard shortcut inKEYBINDS(hidden on mobile). Conditional rendering (no Radix portals) for happy-dom testability.Tests: UI tests for list/edit/conflict flows; router tests for sha precondition and experiment-off ⇒
memory.*routes reject; sidecar self-healing (corrupt meta file ⇒ empty, no crash).Gate G2 (dogfood, blocking): agent-browser session — edit a memory in the panel while an agent task is running → agent's next
viewsees the edit; force a conflicting save (agent writes between UI load and UI save) → conflict surfaces, no silent clobber; pin a file → badge/state survives reload. Screenshots + recording attached.Phase 3 — Usage stats, hot set, preloading (~400 net LoC)
<muxHome>/memory-meta.json, atomic writes):{lastAccessedAt, accessCount, lastWriteAt, pinned}keyed by logical identity, not physical path —global:<relpath>,project:<projectId>:<relpath>,workspace:<workspaceId>:<relpath>.projectId= the existing stable project identity from Mux config where available, else a hash of{runtimeKind, remote host identity, normalized project root}— never the physical worktree path, so N worktrees of one repo share heat/pins, branch-agnostic by relpath. Recorded at the MemoryService chokepoint; never git-tracked.src/common/constants/memory.ts) → cold (tool call). Auto-hot selection is gated on actual local usage (sidecar stats), which a cloned repo cannot fake.loadedSkillSnapshots.ts/compactionHandler.ts/attachments.tsbudgets): recomputed only at session start and compaction boundaries — never per turn — to preserve provider prompt caching.Gate G3 (dogfood, blocking): pinned memory provably in context without a tool call (
bun run debug ui-messages); hot set byte-identical across consecutive turns within a session (cache stability); measured index + hot-set char/token cost reported with screenshots.Phase 4 — Later (not in this change set)
Pre-compaction "flush working context to memory" warning (compactionHandler hook); demotion-sampling experiment; cross-process locking if the global root is ever shared between processes; memory sync/import-export.
Total v1 (P0–P3): ~2,400 net LoC product code (range 2,300–2,900; P1 carries the plumbing risk).
Risks / notes
.mux/memory+ dotfile exclusion: the project root maps inside.mux/memory/, so theviewdotfile-exclusion rule applies to its contents, not the root itself.Acceptance criteria
memory.*oRPC calls; zero context-size change.view/create/str_replace/insert/delete/renamein all three scopes on local and SSH workspaces;createerrors on existing files; traversal/symlink escapes rejected on both local and remote, and the mode/sub-agent access matrix is enforced (tests prove each cell).make static-check+ targeted tests green; all three dogfood gates passed with attached evidence.Generated with
mux• Model:anthropic:claude-fable-5• Thinking:max• Cost:$305.22