Memory System Unification Redesign (Milestone arc: M-U)#
Status: Approved blueprint — §8 decisions owner-confirmed; not yet implemented. Delivery proceeds locally per phase and is pushed to GitHub only after local tests pass. Owner: memory subsystem is a single-author design; this redesign supersedes the incremental M5 → M-PD architecture described in
memory-system.md. That document remains the operator guide for the current (pre-M-U) behaviour; sections it describes are flagged below as superseded where M-U changes them. Scope of change:internal/memory/,internal/memquery/,internal/memconflict/,internal/projectdoc/,internal/catalog/adapter.go,internal/app/app.go, plus new migrations0033+.
1. Why this redesign exists#
The memory system grew through ~15 milestones (M5, M11, M13, M22, M25, M-PA…M-PD). Each shipped a correct feature, but the accretion left structural redundancy that now costs us performance, correctness, and operator time:
- Two parallel memory stores. A DB-backed pgvector store (shared,
queryable, cross-CLI) and the Claude Code harness's file memory
(
~/.claude-accounts/<acct>/projects/<cwd>/memory/*.md). The file store is Claude-only — Codex and Gemini cannot see it — so it actively undermines the system's reason to exist (cross-agent continuity). Today it is bridged one-way by a mirror (internal/memory/mirror.go), which is a sync band-aid, not unity. - Three scopes where two suffice.
session | project | global(0011_memory.sqlCHECK). A session is always built on a project; session-scoped memory is a redundant partition that fragments recall and complicates every read/write path. - Two ranking formulas.
internal/memory/ranking.go(full multi-factor) vsinternal/memquery/memquery.godecayScore(age-only). The same fact ranks differently inmemory_searchvsproject_search. - Mixed retrieval quality. goal/plan are matched lexically
(
LIKE) in cross-layer search, never semantically. The three cross-layer sub-searches run sequentially. - Manual-everything cleanup. The cleaner is 100% operator approval with hard, irreversible deletes and no project-lifecycle signal. Real-world result: hundreds of pending decisions, including obviously stale ones (e.g. the tech-stack fact of a finished project) that the operator must still hand-approve.
- A one-line write bug that disables the whole premise. Agents are
intended to write memory, but the auto-provisioned memory
integration key is granted only
session:read(internal/app/app.go:1129), so everymemory_storefrom an agent returns403. Cross-agent contribution — the entire point — has never actually run from the agents themselves.
This redesign is subtractive first: it removes redundant systems, scopes, and code paths, then adds a small number of better practices on the cleaner foundation.
2. Design constraints (fixed by the system owner)#
- The memory system serves all cloud-agent CLIs (Claude, Codex, Gemini, future ones) equally. Its job: accumulate each agent's contributions in one project brain so that switching agents preserves project familiarity.
- Scopes collapse to two:
project(default; key = cwd) andglobal(written only on explicit operator request). Thesessionscope is removed — session ≡ project. - Writes are free (no per-write human approval). The only human intervention point is conflict detection.
- No patching. Each phase is a self-contained, coherent unit that leaves the system simpler than it found it.
- Lossless, automatic upgrade for existing users. Operators already
running opendray must reach the new system by a normal
opendray updatewith no data loss and no manual SQL. Every migration is forward-only, idempotent, and never hard-deletes existing user data — rows are folded, archived, or re-embedded, never dropped. See §7. - Ship as one change, merge after validation. Because this is a
large, cross-cutting redesign, the whole arc (Phase 0-6) is built on
a single branch and lands as one PR, opened only when every phase
is complete and the migration has been test-validated end-to-end on a
copy of real data. No phase merges to
mainon its own.
Non-goals#
- Re-implementing the embedder zoo. We standardise on one dense embedder + a fallback; we do not add new embedder providers.
- Changing the channel/notify subsystems.
- A new UI framework. UI changes are limited to what the new data model requires (cleanup inbox shrinks; conflicts remain).
3. Target architecture — "one brain"#
[object Promise]Five pillars:
Pillar 1 — One store, two scopes#
- Single Postgres pgvector store. No file memory as a store (see Pillar 6 for the retirement path).
memories.scope ∈ {project, global}.projectkeyed by cwd;globalkeyed by''.sessionremoved.- All agents read and write the same project memory. Switching agents is instant continuity by construction, not by sync.
Pillar 2 — One write pipeline (free, self-consolidating)#
memory_store(text, scope=project) — agents default to project;
global requires an explicit operator-originated instruction.
On write (the hot path — no LLM call, must stay millisecond-fast):
- Embed with the active embedder.
- Semantic dedup within the same (scope, scope_key): nearest neighbour search, top-1.
- If similarity ≥
consolidate_threshold: fold — keep the higher-confidence / newer row as canonical, bumpfrequency/deduped_count, record the absorbed id inmerged_from, append provenance. Return the canonical id. Rule-based only; no LLM in the write path (aclaude --printcall here would put 5-15s of latency and subscription-quota cost on every near-duplicate write). - Else insert a new row.
The LLM text merge does NOT run at write time. Refining the folded
text into one better canonical statement is deferred to the background
consolidation sweep (Pillar 5), where latency and cost do not matter.
That sweep uses the pluggable-worker chain (M25): a local OpenAI-compatible
worker if configured, else the cloud-agent worker (claude --print /
gemini --print), which every opendray install has. Rationale: merge is
an easy LLM task (lighter than the gatekeeper/cleaner/conflict jobs these
workers already do), so capability is not the concern — keeping it off
the synchronous write path is.
No human approval on writes. The write-time gatekeeper (LLM durability filter) stays optional and off by default; quality is maintained by folding + ranking + the maintenance loop, not by a write gate. This replaces "store then dedup then clean up the backlog" with "don't create the duplicate in the first place".
Pillar 3 — One retrieval path, one ranker#
A single scoring function, defined once, used by both spawn injection and on-demand search:
[object Promise]internal/memquery's separatedecayScoreis deleted and routed through this function.goal/plan become first-class vectors (embedded), not
LIKEmatches. journal is already embedded (M-PB); embedding stays synchronous on append so there is no spawn-time blind spot.project_searchruns its facts/journal/docs sub-searches in parallel, then merges and ranks once.
Pillar 4 — Spawn injection (query-less, budgeted)#
At spawn there is no query yet, so injection ranks the project brain by
importance × recency (no similarity term), fills a token budget,
and assembles: goal + plan + top-N facts + recent journal. Delivery
stays per-provider (--append-system-prompt for Claude, AGENTS.md
for Codex, GEMINI.md for Gemini) — already provider-agnostic at the
source (internal/catalog/adapter.go:363 Prepare). Once the agent has
a task it uses project_search/memory_search with a real query.
Pillar 5 — One maintenance loop (auto; human only for conflicts)#
All destructive maintenance runs on soft-delete + grace period (reversible), which is what makes automation safe:
- Consolidation/dedup sweep — retroactively merges semantically equivalent facts (both the ones the write-time fold caught and any it missed). This is where the LLM text merge happens — refining folded near-duplicates into one canonical statement via the pluggable-worker chain. Off the hot path, so latency/cost are free.
- Staleness auto-archive — facts never retrieved + aged, OR facts belonging to inactive/finished projects (lifecycle signal: git inactivity, or operator "project done"), are auto-archived (soft; hard-purged after the grace window).
- Conflict detection (already wired,
internal/memconflict/, 24h) is the only producer of operator-inbox items.
The manual cleanup-approval queue is removed. Routine staleness never reaches the operator.
Pillar 6 — One embedder strategy (availability-tiered)#
Design principle — availability tiering. Not every operator runs a local AI, but every opendray install necessarily has a cloud-agent CLI. So each LLM/vector touch-point must name a backend that is guaranteed present, and prefer better backends only when they exist. The guaranteed-present backend differs by task type:
| Task | Best (if present) | Guaranteed-present floor |
|---|---|---|
| Embedding (vectors) | local dense (bge-m3 1024, nomic-embed-text, mxbai-embed-large, …) served by any local runtime, or a cloud OpenAI-compatible embedding endpoint |
BM25 — pure-Go, zero-dependency, already built |
| LLM text (merge, gatekeeper, cleaner, conflict) | local LLM via any runtime | cloud-agent worker (claude --print / gemini --print) |
Critical distinction: a cloud-agent CLI cannot produce embeddings —
Claude/Gemini --print are text generation, not vector models, and
Anthropic exposes no embeddings API. So the embedding floor is BM25,
not the cloud agent. The cloud agent is the floor for LLM text tasks
(Pillar 2 / Pillar 5 merge, gatekeeper, cleaner, conflict) only.
Local-runtime neutrality (required). "Local LLM" must not mean
"ollama only". Both the embedder and the LLM workers talk to a local
backend over the OpenAI-compatible HTTP contract (/v1/embeddings,
/v1/chat/completions), so any runtime that speaks it works:
ollama, LM Studio, vLLM, LocalAI, llama.cpp server, text-generation-webui,
etc. Configuration is a base URL + model name + (optional) API key — no
runtime-specific code paths. The existing internal/memory/embedder_http.go
and the summarizer's provider_openai_compat.go already follow this
contract; the redesign keeps that contract as the single local-integration
surface and does not special-case any one vendor. (A dedicated
provider_ollama.go may remain as a convenience preset, but it is one
preset over the generic contract, not the only path.)
- Embedder selection is tiered, not fixed: use a configured dense embedder (any local OpenAI-compatible runtime, or a cloud endpoint) when available; otherwise fall back to BM25 automatically. No hard requirement on local AI, and no lock-in to a single local runtime.
- On embedder change, a background re-embed runs automatically; the
search query no longer silently drops rows whose
embedderdiffers (todayWHERE embedder=$2instore_pgvector.gohides them).
4. What shrinks vs what is new#
| Removed / simplified (less redundancy) | Added (more function / performance) |
|---|---|
| File-memory store + harness "write a memory file" loop | Write-time semantic consolidation (upsert-by-meaning) |
session scope + all its read/write plumbing |
Soft-delete + grace window + auto-archive worker |
memquery.decayScore (second ranking formula) |
Project-lifecycle signal (git/done) driving archive |
Lexical LIKE doc search |
One unified scorer (two → one) |
| Manual cleanup-approval queue | Retroactive consolidation sweep |
| Silent multi-embedder row dropping | Automatic re-embed on embedder change |
Performance: fewer rows (consolidate + archive) → faster vector scans; synchronous journal/doc embedding → no spawn blind spot; parallel cross-layer search → lower latency; one canonical embedder + maintained HNSW → stable recall.
5. Data model changes#
New migrations (0033+). Exact column names to be finalised in each PR.
Scope collapse (
0034_memory_drop_session_scope.sql— 0033 was already taken by0033_memory_workers_capture.sql)- Migrate existing
scope='session'rows →scope='project', settingscope_keyto the session's cwd (UPDATE … SET scope='project', scope_key=(SELECT cwd FROM sessions WHERE id = memories.scope_key) WHERE scope='session'). - Adds the
archived_at/archived_reasoncolumns up front (the soft-delete primitive, see item 2) so the fold has a lossless place to put orphans. - Orphans (session row gone): recover cwd from
session_logsif possible; otherwise setarchived_at = now()(lossless — restorable, not dropped). NoDELETEin this migration. - Replace the CHECK constraint with
scope IN ('project','global')only after the fold, so no row violates it mid-migration.
- Migrate existing
Soft delete (
0034_memory_soft_delete.sql) — column may be co-introduced in 0033 (item 1); 0034 then only adds the behaviour.archived_at TIMESTAMPTZ NULL,archived_reason TEXT NULL.- All search/injection queries gain
AND archived_at IS NULL. - A purge job hard-deletes rows where
archived_at < now() - grace_interval(grace = 30 days, §8.2).
Consolidation accounting — no migration (done in Phase 3). The fold count (
deduped_count) and the lossless audit (merged_from) live in the existingmetadataJSONB; a dedicatedfrequencycolumn was dropped from the plan as redundant withdeduped_count, which already backs the UI.Migration numbering note: the filenames in this section were indicative. Actuals so far:
0034= scope collapse (Phase 1),0035= project_docs embedding (Phase 2). Phase 4's soft-delete + lifecycle land at0036+ when written.Lifecycle signal (
00NN_project_lifecycle.sql, Phase 4)project_docs(or a smallproject_statetable): addlast_git_activity_at TIMESTAMPTZ,status TEXT CHECK (active | archived) DEFAULT 'active'. Drives staleness auto-archive.
memory_cleanup_decisions (0026) is deprecated by Pillar 5 — kept
read-only for audit history, no new rows after Phase 4.
6. Phased delivery#
Phases are commits on one arc branch, not separate PRs (constraint §2.6). Each is a self-contained, tested unit so the branch stays green commit-by-commit, but the whole arc lands as a single PR opened only after every phase is complete and the migration is validated end-to-end on a copy of real data. Order is dependency-driven.
Phase 0 — Turn the write path on (unblocks everything) — DONE#
- Granted the
opendray-memoryintegration keymemory:read+memory:write; reconcile scopes on existing installs so the fix reaches users who registered the key before these scopes existed (scopesCoverininternal/app/app.go).globalwrites stay admin-only, enforced at the store handler (globalWriteAllowed). - Acceptance: an agent
memory_storesucceeds (no 403); the row lands inmemorieswithscope='project',scope_key=cwd. Verified the diagnosis live (read path 403s under pre-fix code). - Risk: low. Behaviour change = agents can now write (intended).
Unit tests for
scopesCover+globalWriteAllowed; packages pass-race.
Phase 1 — Scope unification — DONE#
- Migration
0034.sessionremoved from the scope model; anormalizeScopeboundary coerces the legacy literal to project (lossless) so old config / API callers / pre-migration rows never error. Capture rules coercetarget_scope='session'to project. Web + mobile UI and i18n drop the session option; slang strings regenerated. - Lossless fold: every
scope='session'row becomesscope='project'keyed by the session's cwd (joined viasessions). Orphans (session gone) are soft-archived (archived_at, restorable) rather than deleted. NoDELETE. - Validated: the fold ran on an ephemeral Postgres — 4 rows in, 4 out
(zero loss), 0 session rows remain, live-session row folded to its cwd,
orphan archived, new CHECK rejects session. Go
normalizeScope/Scope.Validatetable tests; webtsc -b && vite buildclean; mobiledart analyze libclean; i18n parity 100%. - Risk: medium (data migration), retired by the no-hard-delete rule + the ephemeral-Postgres validation above; full-data replay still happens at arc-final validation.
Phase 2 — Retrieval unification — DONE#
- One scorer:
memory.RankingScoreFields(M-PC formula over raw fields) now serves bothmemory.Serviceandmemquery;decayScoredeleted. A fact ranks identically inmemory_searchandproject_search. - goal/plan are first-class vectors: migration
0035adds embedding columns toproject_docs(copy of 0031);PutDoc/ApproveProposalembed synchronously and null the vector on content change;RunDocEmbedBackfillcatches up history;searchDocsscores by cosine with a lexical 0.6 fallback only for not-yet-embedded docs. project_searchfans its three sub-searches out over a WaitGroup (wall-clock = slowest layer, not the sum).- Backend-only. The web/mobile ranking mirrors already track
ranking.go(unchanged) and the Hit DTO additions are additive, so no client change was required — verified. - Validated:
-raceacross memory/memquery/projectdoc/app; the 0035 index predicates exercised on an ephemeral Postgres (full vector replay is the arc-final step on the real pgvector DB). - Risk: low/medium, retired by the ranking table tests + race pass.
Phase 3 — Write-time fold (cheap, no LLM) — DONE#
- No migration. The fold accounting (
deduped_count+ the newmerged_fromaudit) lives in the existingmetadataJSONB rather than a redundantfrequencycolumn —deduped_countalready backs the mobile "merged ×N" badge. - Semantic dedup is now the default write path: an unset
dedup_thresholdresolves to an embedder-relative default (~0.85 dense, ~0.2 BM25); a negative value is the explicit off switch. On a hit the write folds rule-based (no LLM): newer text becomes canonical, the superseded text is preserved inmerged_from(capped- so the fold is lossless,
deduped_countbumps. LLM refinement of the canonical is deferred to Phase 4's background sweep.
- so the fold is lossless,
- Acceptance (met): two paraphrases yield one row with
deduped_count=1andmerged_fromcarrying the superseded text; negative threshold disables; no LLM on the write path. - Deferred to Phase 4: the cleaner's duplicate-merge metadata bug
(it deletes a dup without recording
merged_fromon the survivor) is fixed when Phase 4 overhauls the cleaner. - Risk: low. Default-on is behaviour-compatible (lossless) and the threshold is tunable; covered by boundary tests.
Phase 4 — Self-maintaining cleanup — DONE (backend 4.1–4.3 + 4.4 UI)#
Owner-confirmed shape: keep the LLM judge, auto-apply its verdicts as reversible soft-archives (no approval queue); remove the cleanup inbox UI and replace it with a read-only Archived (restorable) view + one-click restore; operator inbox keeps conflicts only.
- 4.1 (done): all 8 memory read queries filter
archived_at IS NULL; Archive / ArchiveByScope / Restore / PurgeArchived primitives (no migration — the columns landed in 0034). - 4.2 (done): cleaner auto-applies verdicts as soft-archive (a
"duplicate" is archived noting the survivor — soft-delete preserves it,
so the old merge-metadata gap is moot);
PurgeExpiredhard-deletes past the 30d grace each tick. - 4.3 (done):
ArchiveDormantStale— a dormant project's never-hit aged facts auto-archive (hit facts and active projects untouched).LifecycleDormantDays(90d default, negative disables). - 4.4 backend (done):
GET /api/v1/memory/archived(read) +POST /api/v1/memory/{id}/restore(admin);ListArchivedstore primitive;Memory.ArchivedAt/ArchivedReason. - 4.4 frontend (done). The cleanup-approval UI is fully removed and
replaced by the read-only Archived (restorable) view across web +
mobile + i18n:
- Web:
pages/CleanupInbox.tsx→ newpages/Archived.tsx(cross-project list grouped by scope + restore);router.tsx/memory/cleanup→/memory/archived;pages/Memory.tsxnav +CommandPalette.tsxentry retargeted;ProjectScreen.tsxCleanup tab → Archived tab (header badge, query, card all swapped, run/approve/ reject ripped out);MemoryPanel.tsxstat → archived count. - Shared API: deleted
app/shared/src/lib/memoryCleanup.ts; addedlistArchived+restoreMemory(+archived_at/archived_reasononMemoryRecord) tomemory.ts. - Mobile: deleted
memory_cleanup_api.dart+features/memory_cleanup/; newfeatures/memory_archived/ archived_screen.dart;memory_api.dart+=listArchived/restore;models.dartMemory +=archivedAt/archivedReason;more_screen.darttile +project_screen.darttab swapped. - i18n en/es/zh: removed
web.cleanupInbox.*,web.memory.navCleanupInbox,web.project.header.cleanupPending,web.project.tabs.cleanup,web.project.verdictLabel.*,web.project.cleanup.*,more.items.cleanupInbox.*,memoryCleanup.*,project.cleanupFailed/reason/willMergeInto; addedweb.archived.*,web.memory.navArchived,web.project.tabs.archived,web.project.archived.*,web.project.header.archivedCount,more.items.archived.*,memoryArchived.*,project.archived.*; slang regenerated. Thememory_cleanup_decisionsreset-audit path is retained (table kept read-only). - Validated: web
tsc -b+vite buildclean; mobiledart analyze libclean (0 issues); i18n parity 100% (es/zh, 0 missing/extra/token).
- Web:
- Validated: the soft-archive read-filtering, auto-archive, purge,
and dormancy SQL all exercised on an ephemeral Postgres + unit tests;
-racegreen across memory/cleaner/app. - Risk: medium, retired by reversibility (soft-delete + 30d grace) and conservative defaults (90d dormancy, never-hit only).
Phase 5 — Retire the file layer — DONE#
- One-time import (done).
Mirror.BackfillAllenumerates every project scope_key already in the store andSyncCwds each, launched once fromRunServices(internal/app/app.go) next to the M-PB/Phase-2 embed backfills. So an upgrading operator's pre-existing Claude file memories fold into the single DB store on first boot, not only after each project's next spawn. Idempotent (SyncCwd dedupes by source_path+mtime), so it self-skips once caught up. - File layer suppressed via guidance (done). opendray can't disable a
CLI's built-in file memory directly, so the lever is the spawn
system-prompt:
memoryGuidanceText(internal/catalog/adapter.go) now has a "Use this, not your built-in file memory" section telling every agent to route durable facts throughmemory_storeand not write Claude's# Memory/MEMORY.md(CLI-local → invisible to the next Codex/Gemini), noting opendray already imports existing files and injects project memory at startup, so the file layer is redundant. - Capture net kept (done). The per-spawn mirror (
WithMemoryMirror) stays wired, so any file memory an agent still writes during the transition keeps flowing into the DB. A project that has file memory but no DB rows yet imports on its first spawn (the backfill only front-loads already-known projects) — nothing is lost, only the timing of the import differs. - Acceptance: a Claude session with no local memory files still receives full project memory at spawn — already true via the ambient + projectdoc DB injectors (unchanged here); agent-authored memories land in DB (Phase 0 scope fix) and are visible to a subsequent Codex/Gemini session in the same cwd.
- Validated:
go build ./...clean;go test -racegreen across memory/catalog/app; new DB-free unit tests formatchesEncodedCwd,findClaudeMemoryDirs(temp-HOME, symlink-resolved), and theBackfillAllnil guard. Full file→DB import over real data is part of the arc-final replay validation. - Risk: low/medium (operator-visible: agents stop writing file memory). Retired by keeping the one-way mirror as the legacy capture net and by the import being idempotent + lossless.
Phase 6 — Embedder unification — DONE#
- Availability tiering — already structural, confirmed. The backend
matrix in
buildEmbedder(internal/app/app.go) already realises decision §8.1:auto/bm25→ the pure-Go BM25 floor (guaranteed present, zero-dep);http→ any OpenAI-compatible dense endpoint;local→ ONNX. Dense is used when configured; BM25 is the floor otherwise. We deliberately do not auto-switch an existingauto(BM25) install to a detected local dense model on upgrade — that would trigger a surprise full re-embed and violate the smooth-upgrade constraint (§2.5). Switching to dense stays an explicit operator config change, which the new converge loop then services automatically. - Automatic background re-embed on change (new).
Service.RunReembedConverge(internal/memory/reembed_converge.go), launched once fromRunServicesnext to the Phase-5 backfill. Each cycle:driftCount(cheap COUNT GROUP BY viaCountByEmbedder) finds rows whoseembedder≠ active; if any, it runs the existing resumableReembedpass and re-stamps them to the current embedder, then sleeps (short after progress,IdleInterval5m when converged or stalled). So an operator who switches embedders just restarts — convergence is automatic, no "Migrate" click. The manualPOST /reembedendpoint stays as an on-demand trigger. Self-skips in steady state; an embedder outage backs off instead of spinning. - The silent
embedder-mismatch search exclusion — intent met by convergence, guard retained.PgvectorStore.SearchkeepsWHERE embedder = $activebecause cross-(embedder, dim) cosine is mathematically invalid — comparing a bge-m3 1024-vec against a BM25 384-vec is meaningless, and different dims can't even share an index. Dropping the predicate outright would surface garbage or error mid-migration. The redesign's actual goal — "no row stays silently invisible" — is met instead by the converge loop driving drift to zero, after which the predicate excludes nothing. So the guard is kept for correctness; convergence removes the silent permanence it used to have. - Acceptance (met): changing the configured embedder triggers a background re-embed automatically; rows converge to the active embedder and rejoin recall without operator action. Re-embed is batched + resumable (cursor by id) and rate-limited (sleep between passes), so it stays I/O-flat at the single-gateway scale.
- Validated:
go build ./...clean;go test -racegreen across memory/app; new unit tests fordriftCount(mixed + converged),Reembeddraining drifted rows with correct re-stamping, andRunReembedConvergeexiting on ctx cancel. Full embedder-swap replay on real pgvector is part of arc-final validation. - Risk: low/medium, retired by: upgrade-safety (no auto-switch of existing installs), the dim-correctness guard staying in place, and convergence being resumable + idempotent + soft-failing.
7. Migration & backward compatibility#
Constraint §2.5 is a hard requirement: an existing operator reaches the
new system by a normal opendray update, with no data loss and no
manual SQL. Concretely:
Lossless. No M-U migration runs a DELETE against user data. The
session→project collapse folds rows (orphans are archived, not
dropped); cleanup uses soft-delete + 30-day grace (restorable); the
file layer is imported before it is retired; an embedder change
re-embeds, never discards. Every destructive-looking step is a
reversible state change, not a deletion. Each data migration ships with
a counted dry-run and a pre-migration opendray memory export (the
backup/restore path already exists, backup/service_import.go) so even
operator error is recoverable.
Automatic on upgrade — DONE. Migrations now auto-apply on startup,
fail-closed. app.New (internal/app/app.go) calls st.Migrate right
after store.Open and before catalog.New (which seeds tables
migration 0001 creates — preserving the #162 fresh-DB ordering). A
migration error aborts boot rather than running the new binary against
the old schema. So opendray update + restart reaches the new schema
with no separate step; opendray migrate stays as a standalone command
for operators who prefer explicit control. Migrations are idempotent,
forward-only, and transactional (tracked in schema_migrations), and
the serve path was previously the one place that did not migrate, so
this only adds a no-op once everything is applied.
Compatibility staging. Phases 0–3 are behaviour-compatible for the operator (writes start working; rankings unify; dedup tightens). Phases 4–5 are the operator-visible changes (cleanup inbox disappears; file memory retires) and ship behind flags with conservative defaults so the first upgraded run behaves like the old one until the operator opts in.
memory-system.md is updated in the same PR to keep the operator guide
truthful; the superseded sections are: five-layer/three-scope model
(§"The five layers", §"Project isolation"), Gatekeeper as the primary
quality gate (§"Quality gates"), and the manual Cleanup inbox
(§"Cleanup inbox").
8. Resolved decisions (owner-confirmed)#
All decided. Two governing principles: availability tiering (Pillar 6) — prefer the best backend present, fall back to a guaranteed-present floor, and the floor differs by task type because a cloud-agent CLI can do LLM text but cannot produce embeddings — and keep LLM work off the write hot path (decisions 5–6).
- Embedder — tiered and runtime-neutral. Prefer a dense embedder
when available — a local model (
bge-m31024,nomic-embed-text,mxbai-embed-large, …) served by any OpenAI-compatible local runtime (ollama, LM Studio, vLLM, LocalAI, llama.cpp, …), or a cloud embedding endpoint. Fall back to BM25 automatically when none is configured. BM25 is the guaranteed-present floor (pure-Go, zero-dep). The cloud-agent CLI is not an embedding backend (no vector API).consolidate_thresholdis embedder-relative (≈0.85 dense / ≈0.2 BM25, matching today's dedup thresholds). - Grace window — 30 days soft-delete → hard-purge.
- Write-time gatekeeper — kept as an optional knob, off by default. Quality is carried by consolidation + ranking + maintenance, not a write gate.
- Lifecycle "project done" trigger — both: git-inactivity threshold of 60 days, and an explicit operator "archive project" action.
- Consolidation merge — split across two paths. Write-time = a
cheap rule-based fold (keep canonical + bump frequency + record
merged_from), no LLM, to keep writes millisecond-fast. Background sweep = the LLM text merge via the pluggable-worker chain (M25): any local OpenAI-compatible runtime if configured, otherwise the cloud-agent worker (claude --print/gemini --print), which every install has. The cloud agent is a valid backend here because merge is an LLM text task (unlike embedding in decision 1). - LLM merge placement — background only, never the write hot
path. A
--printcall costs 5-15s + subscription quota; putting it on every near-duplicate write is unacceptable. Write-time folds cheaply; the background consolidation sweep refines text when latency and cost are free.
9. Code reference map (entry points touched)#
| Concern | Location |
|---|---|
| Write-scope bug (Phase 0) | internal/app/app.go:1127–1160 (ensureMemoryIntegration) |
| Scope definitions / CHECK | internal/store/migrations/0011_memory.sql |
| Memory service (store/search/dedup/rank apply) | internal/memory/service.go |
| pgvector store (insert/search/delete/RecordHits) | internal/memory/store_pgvector.go |
| Unified ranker target | internal/memory/ranking.go |
| Cross-layer search (second formula to delete) | internal/memquery/memquery.go |
| Spawn injection (provider-agnostic) | internal/catalog/adapter.go:363 (Prepare) |
| Project docs (goal/plan/journal) | internal/projectdoc/ |
| Cleaner (to be replaced by auto + soft-delete) | internal/memory/cleaner/ |
| Conflict detection (kept; the only human gate) | internal/memconflict/ |
| File-memory mirror (one-way; retire in Phase 5) | internal/memory/mirror.go |
| MCP memory tool surface | cmd/opendray/mcp_memory.go |
10. Success criteria for the arc#
- One store, one scope default, one ranker, one embedder.
- An agent of any kind can write project memory; a different agent in
the same project reads it at spawn — verified by the cross-CLI smoke
test in
memory-system.mdextended to a write→switch→read loop. - The operator's recurring inbox work is conflicts only; routine staleness and duplicates are handled automatically and reversibly.
- An existing operator upgrades via a single
opendray updateand lands on the new system with zero data loss and zero manual SQL — validated by replaying the full migration on a copy of real pre-upgrade data and reconciling row counts to no loss. - Net negative line count in
internal/memory*after the arc (the redesign removes more than it adds).
11. Phase 7 — Config/settings consistency + universal auto embedder#
Status: Approved (owner-confirmed 2026-06-06). A follow-on arc that closes the gap between the M-U implementation (Phases 0–6) and the settings surface operators actually see, and turns
backend = "auto"into a truly universal, self-configuring choice. Delivered on the samefeat/memory-unificationbranch.
11.1 Why#
The M-U arc unified the engine but left the config/settings surface describing the pre-M-U world, and left one UX trap that only the designer could see through:
- Leftover deprecated keys/features. The
sessionscope (removed in Phase 1) is still offered in the server-settings default-scope selector on both web (ServerSettings.tsx) and mobile (server_settings_screen.dart) — Phase 1 dropped session from the memory read/write UI but missed the settings selector.config.godoc comments still describe the cleaner as a proposal queue (Phase 4 made it auto-apply), still listsessionas a valid scope, still frame ONNX as "Phase 2 will add", and still presentchromemas a real store even thoughbuildStoreonly implements pgvector and errors onstore="chromem"(a config that bricks boot).[memory.cleaner]. summarizer_idis a dead key since M25 (dispatch moved to the worker registry). - New features with no config surface.
LifecycleDormantDays(Phase 4.3) andGracePeriod(Phase 4.2) exist oncleaner.Configwith hardcoded defaults (90d / 30d) but are not wired fromconfig.toml— operators cannot tune or disable dormant auto-archive or the restore window.config.example.tomlhas no[memory]block at all. - The
autotrap.buildEmbedderreturns BM25 unconditionally forbackend="auto"and never consults theAutoDetectprobe or the configured[memory.http]endpoint. So an operator who configures a dense endpoint but leaves the defaultbackend="auto"silently runs BM25 with their dense model dormant — and a fresh operator who never installed an embedding model has no signal that semantic memory is off. Phase 6 deliberately keptauto=BM25 to avoid a surprise re-embed on upgrade — but that fear is now obsolete: the Phase 6 converge loop already makes re-embed safe, automatic, resumable, and rate-limited.
11.2 Universal auto embedder (supersedes the Phase 6 "auto never auto-promotes" decision)#
backend = "auto" (the default) now resolves by availability tiering,
upgrade-only, in resolveMemoryService (which has both the config and
the store, so it can probe + read CountByEmbedder):
- Explicit backends unchanged.
bm25/http/localbehave exactly as before. Onlyauto(and empty) gains intelligence. [memory.http]configured + reachable (ProbeEndpointsucceeds) → use the dense HTTP embedder. The converge loop re-embeds any BM25 rows → dense automatically. Kills the "configured but dormant" trap.- Nothing configured / not reachable, and no dense rows yet → BM25 floor. A fresh install with no embedding model Just Works.
- Upgrade-only, never churn (the owner-chosen posture). Intent is
derived from
CountByEmbedder: if the DB already holds rows from a dense embedder, that is the intended tier.- Dense intent + endpoint reachable → dense (steady state).
- Dense intent + endpoint unreachable at boot → keep the dense
embedder active (its
Name()matches the existing rows, so reads by importance×recency / spawn injection keep working, and the converge loop sees no drift → no re-embed → no churn). Query similarity search and new writes degrade with a loud WARN until the endpoint responds, then auto-resume. The system never downgrades dense→BM25, because that would both hide every dense row (WHERE embedder=$active) and trigger a lossy re-embed. - Edge — dense rows but
[memory.http]removed from config → fail-closed with a remediation message ("restore the endpoint, or setbackend=\"bm25\"to abandon dense"), because we cannot reconstruct the dense embedder and must not silently churn.
local(ONNX) stays explicit.autocovers the common HTTP-dense + BM25-floor cases; an operator who wants the cgo ONNX path setsbackend="local"explicitly (it depends on a build tag + on-disk model files, so silent adoption is inappropriate).
Status surfacing (the "operator can't see it" fix). /memory/status
(and the web/mobile Memory settings) now show the effective embedder,
the configured embedder, and a degraded flag, e.g. "Active: BM25
keyword (no embedding model configured — semantic memory off)", "Active:
BM25 (a dense endpoint is configured but unreachable — using BM25 until it
responds)", or "Active: Qwen3 dense (converging N rows…)".
11.3 Settings/config cleanup (subtractive, matches §2.4)#
- Drop the
sessionoption from the web + mobile default-scope selectors (→project | global); regen slang; i18n parity. - Remove
chromemfromconfig.godocs,config.toml,settings.ts, and the mobile field list (single pgvector store, §Pillar 1).Storestays as a pgvector-only, forward-compat key. - Rewrite the stale
config.godoc comments (cleaner "proposes" → auto-apply soft-archive; scope drops session; ONNX/chromem framing;similarity_thresholddefault is 0.1, not 0.7). - Remove the orphaned cleaner approval-queue routes (
/memory/cleanup/ decisions,/approve,/reject) — no frontend caller after Phase 4.4; thememory_cleanup_decisionstable stays read-only for audit.
11.4 New config knobs (close the §11.1 omissions)#
[memory.cleaner].lifecycle_dormant_days→ wired intocleaner.Config.LifecycleDormantDays(default 90; negative disables). The §8.4 "60 days" figure is superseded by the 90-day value that Phase 4.3 shipped and validated; the doc is reconciled to 90.[memory.cleaner].grace_days→ wired intocleaner.Config.GracePeriod(default 30; decision §8.2).config.example.tomlgains a fully-documented[memory]block (auto tiering, gatekeeper off-by-default, cleaner, lifecycle/grace).
11.5 Acceptance#
- A fresh install with no embedding model runs BM25 and the UI says so.
- Configuring
[memory.http]and restarting switches to dense with no manualbackendedit and no data loss (converge handles the re-embed). - A dense endpoint outage never downgrades or churns; it degrades and auto-recovers.
- No removed feature (
session, chromem, approval queue) remains selectable/dead in any settings surface; every shipped maintenance knob (dormancy, grace) is reachable fromconfig.toml.