← Command Centre

Claude Operating System

The hybrid AI-infrastructure layer merging Brainzyme® numbered sessions with Agentic OS patterns — session management, thread recovery, context hygiene, daily digests, and agentic workflow plumbing. Two layers working together: numbered sessions as the backbone, daily memory as the complementary layer.

v0.3.0 · 6 May 2026 · Hybrid Session System (Agentic OS migration)
Claude OS Hybrid User Guide (Google Doc) — full walkthrough of the hybrid session system for new users
→ GSD Task Spine dashboard — the durable canonical task store underneath Claude OS. M1 shipped, M2 planned — detailed milestone write-up with charts.
NEW · 2026-05-15
Dream Review tab is live
23-agent pattern analysis + Codex 5.5 + Gemini 2.5 Pro + Karpathy / Top-15 / 3-LLM positioning. 12 develop-here decisions with verdict dropdowns & feedback boxes — push to Inbox as JSON.
Jump to Dream Review →

You have four commands. One launches Claude with everything pre-configured for recovery. One catches you up inside a running Claude session. One saves your state mid-flight before context gets expensive. One wraps up the session with automated banking, daily digest generation, and skill reconciliation.

The cycle when context climbs: /bank/clearclaude-resume N. That's it. The rest is detail.

One-time install — PATH setup for the shell wrapper

The shell wrapper lives at F:\Agentic-OS\bin\claude-resume.cmd. To call it as just claude-resume from any terminal, that folder needs to be on your user PATH. Run this in PowerShell once:

$bin = "F:\Agentic-OS\bin"
$path = [Environment]::GetEnvironmentVariable("Path", "User")
if (($path -split ";") -notcontains $bin) {
  [Environment]::SetEnvironmentVariable("Path", "$path;$bin", "User")
  Write-Output "Added: $bin"
} else { Write-Output "Already on PATH." }

After running it, close and reopen any open terminals — PATH changes only affect newly-launched processes. The change is permanent across reboots.

Cold start — from a terminal

claude-resume <N>

Boots Claude Code (Opus 4.6 + max-effort + 1M-context attempt) with two modes:

  • Lightweight (default) — fresh thread + Sonnet catch-up brief via /resume <N>. Right after /bank + /clear for a clean low-cost restart.
  • Heavyweightclaude-resume <N> --original looks up most recent thread UUID and runs claude --resume <UUID>. You're back IN the old thread, full transcript loaded. Right for crash recovery / quick breaks. Defeats /bank+/clear cost saving — full prior context restored.
claude-resume 32
claude-resume 32 --original
In-session — refresh current session OR switch

/resume [N optional]

Without N, autodetects the current session by looking up which session file contains this thread's UUID. With explicit N, switches to (or refreshes from) a different session. Dispatches a Sonnet subagent to read prior .jsonl transcripts and presents a visible catch-up brief.

/resume

# or, to override:
/resume 32
Mid-session — before clearing

/bank [N optional]

Without N, autodetects the current session and banks there. With explicit N, banks to a different session (warns if N doesn't match the autodetected current session). Writes a dated ### Bank entry block to ## Bank log; refreshes the top-of-file pointer. Save-only — never auto-clears.

/bank

# or, to override:
/bank 32
End of session — automated wrap-up

/wrap-up

Triggers the meta-wrap-up skill — automated end-of-session banking, daily digest generation, and skill reconciliation. Consolidates session findings into a dated daily memory entry and ensures nothing falls through the cracks between sessions.

/wrap-up

The high-context cycle

When you notice context climbing past ~500K tokens or the conversation feels slow, stop and run this:

/bank

/clear

claude-resume 32

No N needed for /bank — it autodetects the current session. claude-resume still requires N because the new terminal has no context to detect from yet. The new thread starts with a small context window and the freshly banked state as its first reference. Fidelity stays high; weekly usage stops bleeding.

After /bank: you'll see one line — Banked to session N (NAME). Ready to /clear and resume when you're ready. The conversation is unchanged. Keep working in this thread, or run /clear when you actually want a fresh start. The bank entry waits in the session file until you do.

When to use which

SituationCommand
Thread crashed / closed / /clear'd, want to come backclaude-resume <N> from a fresh terminal
Inside Claude, want a refresher on what this thread's session was doing/resume (no arg — autodetects)
Inside Claude, switching from one session to another/resume <N> (explicit override)
Context approaching limit, restart soon makes sense/bank/clearclaude-resume <N>
Context high but you want to KEEP working in this threadClaude Code native /compact (lossier but no restart)
End of a milestone, might come back later/bank (no arg, no restart needed)
About to do extended tool-heavy workPre-emptive /bank first
Done for the day, want to close out cleanly/wrap-up — banks + generates daily digest + reconciles skills

/bank vs /wrap-up vs /compact vs /clear — the four-tool hierarchy

Four tools that look similar but do different things. /bank mid-stream, /wrap-up at session end, /compact when context fills, /clear when you actually want to start fresh.

ToolWhat it doesTouches conversation?Use when
/compactAnthropic-side context summarisation. Keeps the same thread/UUID alive.Yes — replaces older turns with a summaryContext approaching limit but you want to keep working in the same thread. No save to disk.
/clearWipes the in-memory conversation, same Claude process / terminal. New thread UUID starts.Yes — empties itYou're done with the prior topic and want a clean slate without restarting Claude.
/bank [N]Writes thread state into context/sessions/session-N-*.md. Conversation continues.NoMid-stream save when you might keep going OR resume later. Save-only.
/wrap-upFull end-of-session checklist: bank + collect feedback + update learnings + commit.Bank-style write + git commitTrue end of working session. The "everything I should do before signing off" macro.

Phone-thread continuity: why /compact > /clear

  • /compact keeps the thread UUID alive — the phone app stays connected to the same conversation, just with summarised history. No reconnection.
  • /clear ends that thread and starts a new one in the same terminal process — the phone app's mirror of the old thread will go stale; reopening on the phone shows the new (empty) thread.
  • Restarting Claude entirely (closing PowerShell, relaunching claude or via claude-resume N) creates a brand new process AND new thread. Phone definitely sees a fresh boot.

Recommended rhythm

/bank/compact, no /clear. Cleanest pattern for keeping the phone-side mirror coherent. /compact is fuzzy but warm; /clear is cold. The only time /clear earns its keep is when context has gotten genuinely confused (off-topic drift, hallucinated state) and you want a deterministic restart — but in that case /bank then claude-resume N from PowerShell gives you the same clean slate plus a Sonnet brief, so /clear is rarely the best tool.

/bank vs /wrap-up specifically

/wrap-up runs /bank as part of its checklist, then layers feedback collection, learnings updates, and a git commit on top. If you're treating each session-end as a transaction (commit included), use /wrap-up. If you're banking mid-stream knowing you'll keep going (or resume later from the phone), use /bank.

Commands

CommandWhere it runsArgsWhat it does
claude-resume <N>Shell terminal (cmd.exe / git-bash / PowerShell — wrapper at F:/Agentic-OS/bin/claude-resume.cmd or .sh)N = session number (required)Launches Claude Code with default model + effort + betas from settings.local.json; first message is /resume N. N required because cold-boot has no context to autodetect.
/resume [N]Inside Claude CodeN = session number (optional)If N omitted: autodetect via current thread UUID → which session file lists it. If N provided: glob that session. Then read frontmatter + body → dispatch Sonnet → present brief → confirm → resume.
/bank [N]Inside Claude CodeN = session number (optional)If N omitted: autodetect current session. If N provided AND mismatches autodetected: warn before banking. Compose dated entry → append to ## Bank log → refresh top-of-file pointers.
/wrap-upInside Claude CodeTriggers meta-wrap-up skill: banks session state, generates a daily digest entry from session activity, reconciles skills used in this session. End-of-day command.
/clear (native)Inside Claude CodeClears the conversation history; starts a new thread on the same session ID. Pair with /bank first.
/compact (native)Inside Claude CodeSummarises the conversation in place. Lossier than bank+restart but doesn't require a new thread.

Key file paths

PathPurpose
F:/Agentic-OS/bin/claude-resume.cmdWindows shell wrapper for cold-boot resume
F:/Agentic-OS/bin/claude-resume.shBash variant for git-bash
F:/Agentic-OS/.claude/commands/resume.mdThe /resume slash command body
F:/Agentic-OS/.claude/commands/bank.mdThe /bank slash command body
F:/Agentic-OS/.claude/commands/wrap-up.mdThe /wrap-up slash command body
F:/Agentic-OS/.claude/session-thread-register.shPostToolUse hook that auto-stamps thread UUIDs into session frontmatter
F:/Agentic-OS/.claude/settings.local.jsonDefaults: model, effort, betas, plus the hook registration
F:/Agentic-OS/memory/working-context/session-<N>-*.mdThe session file. Body sections (## Objective, ## Pending, ## Bank log, ## NEXT-SESSION RESUME) are first-class state.
F:/Agentic-OS/memory/daily/Daily digest entries generated by /wrap-up
F:/Agentic-OS/SOUL.mdAgent personality and operating principles (from Agentic OS)
F:/Agentic-OS/USER.mdUser preferences and context (from Agentic OS)
F:/Claude Root/Legacy project root (read-only archive)

Settings keys

Top-level keys added to F:/Agentic-OS/.claude/settings.local.json:

KeyValueEffect
model"claude-opus-4-7"Default model for all launches in this project
effort"max"Highest extended-thinking tier. Replaces the older ultrathink magic word.
betas["context-1m-2024-08-07"]Best-effort 1M-context attempt. May be silently ignored if Claude Code doesn't recognise the beta header (see Build Decisions tab for why this is uncertain).
hooks.PostToolUseMatcher Write|EditWraps the thread-register script with CLAUDE_TOOL_INPUT="$(cat)" bash … so the script reads stdin via the established convention.

Session file frontmatter contract

---
session: 32
name: Targeting & Go-Live
status: ACTIVE                 # or ENDED
started: 2026-04-12
last_updated: 2026-04-24 22:50  # auto-bumped by /bank and the hook
last_touch: One-line description of the most recent state
originSessionId: 7cc197d0-db4a-4801-af79-b6f454701b09  # first .jsonl UUID
continuationSessionIds:        # appended automatically by the hook
  - 3e0a1a9d-8727-46ec-b00f-be1c93095b68  # 2026-04-17 resume
  - c13e1bca-8226-4b9e-a79a-355de8bb5fec  # 2026-04-24 resume
absorbed_sessions: [28, 26]    # optional — sessions merged into this one
inherited_from: 29             # optional — predecessor session
---

The model in one paragraph. A Brainzyme "session" (numbered 18, 32, 38, …) is a long-lived workstream that may span dozens of Claude Code threads. Each thread is a single .jsonl transcript file. The session file (session-N-*.md) is the durable record; the thread is the working memory. /bank writes from working memory to durable record. /resume reads the durable record + selectively reads transcripts via a Sonnet subagent. The PostToolUse hook keeps the durable-record-to-transcript map (continuationSessionIds) accurate without manual bookkeeping.

Resume flow

User runs: claude-resume 32 | v [shell wrapper] Reads model + effort + betas from settings.local.json Launches: claude --effort max "/resume 32" | v Claude Code starts a NEW thread (new .jsonl UUID) | v [/resume slash command fires] | +--> Glob memory/working-context/session-32-*.md +--> Read frontmatter (originSessionId + continuationSessionIds) +--> Read body verbatim (Objective + Pending + Bank log + NEXT-SESSION RESUME) +--> Resolve current thread UUID (mtime-most-recent .jsonl in project folder) +--> Append current UUID to continuationSessionIds, bump last_updated +--> Dispatch Sonnet subagent with UUID list + read budget (last 500 lines primary) | v [Sonnet subagent] wc -l on transcripts -> tail last 500 lines of most recent Returns structured brief: in-progress / last actions / open loops / next action / gotchas | v [Claude Opus 4.6 displays] Sonnet brief verbatim Session file body sections verbatim (Objective + Pending + Bank log + NEXT-SESSION RESUME) Asks: "Match your recollection? Ready to continue?" | v On confirm: standard session-start checks (pending_tasks_registry, RAG pre-flight) then continue work from the Next action

Bank flow

User runs: /bank (no N — most common) /bank 32 (explicit override) | v Resolve target session: - No arg: get current thread UUID (mtime of newest .jsonl) grep session files for that UUID -> the match is current session (multiple/none -> ask user) - With arg: glob session-N-*.md (warn if mismatch with autodetected) | v Compose bank entry from in-context recall: - Working on (one line) - Decisions / progress this thread (bullets) - Files touched (created / modified) - Open loops (- [ ] checklist) - Next action (single sentence) - Gotchas (only if non-obvious) | v Empty? -> "Nothing to bank — skipping" and exit | v Append to ## Bank log section (create if absent, append-only) Refresh frontmatter last_updated / last_touch Replace ## NEXT-SESSION RESUME content with fresh pointer | v Save file -> "Banked to session 32. Ready to /clear and resume when you're ready."

Thread-register hook flow

[Any Write or Edit tool fires] | v PostToolUse hook receives JSON via $CLAUDE_TOOL_INPUT: { session_id, tool_name, tool_input.file_path } | v file_path matches working-context/session-*.md ? no -> exit 0 silently yes -> continue | v session_id already in frontmatter (origin or continuations) ? yes -> exit 0 (idempotent) no -> continue | v Replace any <> placeholder Or fill empty originSessionId Or append to continuationSessionIds with "# YYYY-MM-DD resume" comment Bump last_updated | v Write file. Log to .claude/session-thread-register.log. Exit 0. Errors: logged but never block the underlying Write/Edit.

The Hybrid Model. Claude OS v0.3.0 merges Brainzyme® numbered sessions (the proven backbone built since April 2026) with selected patterns from Simon Willison's Agentic OS (SOUL.md/USER.md, heartbeat, wrap-up, progressive disclosure, daily memory). Two layers work together: numbered sessions remain the primary unit of work, and daily digests act as the complementary cross-session memory layer.

Numbered sessions are long-lived workstreams (e.g. "Session 32: Targeting & Go-Live") spanning days or weeks. Daily digests are short summaries generated at end-of-day by /wrap-up, capturing what happened across all active sessions that day.

Folder structure

F:/Agentic-OS/ | +-- CLAUDE.md # Slimmed operating instructions (inherits from Claude Root) +-- SOUL.md # Agent personality + principles (from Agentic OS) +-- USER.md # User preferences + context (from Agentic OS) +-- AGENTS.md # Codex QC standing context | +-- .claude/ | +-- commands/ | | +-- resume.md # /resume slash command | | +-- bank.md # /bank slash command | | +-- wrap-up.md # /wrap-up slash command (meta-wrap-up skill) | +-- skills/ | | +-- os-heartbeat.md # Silent startup scan | | +-- os-session-manager.md # Session lifecycle (create/bank/resume/end) | | +-- meta-wrap-up.md # End-of-session banking + daily digest | +-- settings.local.json | +-- session-thread-register.sh | +-- bin/ | +-- claude-resume.cmd # Windows cold-boot wrapper | +-- claude-resume.sh # Bash variant | +-- memory/ | +-- working-context/ | | +-- _index.md # Session registry | | +-- session-N-*.md # Individual session files | +-- daily/ | | +-- 2026-05-06.md # Daily digest entries | +-- feedback_*.md # Canonical lessons | +-- CANONICAL_*.md # SoT documents | +-- MEMORY.md # Project index | +-- tools/ # Python tooling +-- config/ # Brand, product, creative config +-- F:/Claude Root/ # READ-ONLY legacy archive

OS skills

4 skills
SkillTriggerWhat it doesStatus
os-heartbeat Session startup (silent) Scans pending tasks, checks session file freshness, runs RAG pre-flight, detects stale bank entries. Replaces the manual startup checklist from CLAUDE.md §3. SHIPPED
os-session-manager /resume, /bank, session create/end Full session lifecycle: create numbered sessions, bank state, resume with Sonnet catch-up, end and archive. Wraps the existing slash commands with structured skill logic. SHIPPED
meta-wrap-up /wrap-up Automated end-of-session: banks current session, generates a daily digest entry (cross-session summary for the day), reconciles which skills were used and flags any unresolved items. SHIPPED
os-upstream-monitor Manual / periodic Monitors Simon Willison's Agentic OS repo for useful changes. Cherry-picks selectively rather than running update.sh (which would overwrite local CLAUDE.md and skills). Logs cherry-pick decisions. PATTERN

What came from where

Brainzyme Original

  • Numbered sessions (session-N-*.md)
  • Bank-and-trim lifecycle
  • Thread UUID tracking (PostToolUse hook)
  • Kanban sync to Command Centre
  • Sonnet subagent catch-up brief
  • Lightweight vs heavyweight resume modes
  • RAG pre-flight (Pinecone)
  • Codex QC peer loop
  • Delegation-first triage

Simon's Agentic OS

  • SOUL.md / USER.md separation
  • Heartbeat skill (startup scan)
  • Wrap-up skill (end-of-session)
  • Progressive disclosure (load on demand)
  • Daily memory / digest pattern
  • Skills-as-markdown convention
  • Upstream update mechanism (update.sh)

New Hybrid (v0.3.0)

  • Daily digest generated FROM session banks
  • Cherry-pick upstream (not update.sh)
  • Slimmed CLAUDE.md (detail in SOUL.md + skills)
  • Single folder consolidation (F:/Agentic-OS/)
  • Brainzyme dashboard as primary CC
  • Upstream monitor pattern

Session Management

3 components live

Session Resume Command

LIVE

One-command recovery for sessions whose thread has ended, crashed, or been compacted past usefulness. Shell wrapper claude-resume <N> launches Opus 4.7 + max-effort thinking + 1M-context attempt and fires the in-session /resume <N> slash command, which reads the session file's frontmatter and body, dispatches a Sonnet subagent to read the prior .jsonl transcripts, and presents a visible catch-up brief alongside the session file's pending sections before resuming.

Thread-Register PostToolUse Hook

LIVE

Auto-stamps the current Claude Code thread UUID into continuationSessionIds of any working-context session file you edit. Idempotent. Silent. Never blocks the underlying Write/Edit. Eliminates manual bookkeeping of which threads belong to which session — by editing a session file, the thread declares itself.

Bank + High-Context Playbook

LIVE

/bank <N> writes current thread state to the session file's append-only ## Bank log. Pairs with /clear + claude-resume <N> for the canonical "context climbing → compress → restart fresh" cycle. Cheaper and higher-fidelity than letting Claude auto-compact.

OS Skills (v0.3.0)

4 new components

os-heartbeat

SHIPPED

Silent startup scan skill. Runs automatically at session start: checks pending tasks registry, verifies session file freshness, triggers RAG pre-flight for ambiguous work, detects stale bank entries that need attention. Replaces the manual startup checklist previously baked into CLAUDE.md §3.

os-session-manager

SHIPPED

Session lifecycle management skill. Wraps the existing /resume and /bank slash commands with structured skill logic for creating, banking, resuming, and ending numbered sessions. Provides the backbone session management that the hybrid system builds upon.

meta-wrap-up (extended)

SHIPPED

End-of-session automation skill, extended for the hybrid system. Now handles both numbered session banking AND daily digest generation. When invoked via /wrap-up, it: (1) banks the current session, (2) generates a dated daily digest entry in memory/daily/ summarising all session activity for the day, (3) reconciles which skills were used and flags unresolved items for the next session.

Upstream Monitor Pattern

PATTERN

Describes the approach for monitoring Simon Willison's Agentic OS upstream repo and selectively cherry-picking useful changes. NOT a running skill yet — currently a documented pattern. Key principle: cherry-pick, never run update.sh (which would overwrite local CLAUDE.md, skills, and config). Gives full ownership of the hybrid system while still benefiting from upstream innovation.

Dashboard Push-Live Inbox (Phase 7 LIVE)

4 components · 11 dashboards

FastAPI inbox server

LIVE

The local receiver: F:/Agentic-OS/tools/inbox_server.py on 127.0.0.1:7392. Validates the canonical envelope (decisions / legacy_push_live kinds), enforces ALLOWED_DASHBOARDS slug allowlist, applies 30 req/min per-IP rate limit + 5-min idempotency dedup, writes JSON to inbox/<slug>/<ts>.json. Auto-starts at logon via the AgenticOS-InboxServer Scheduled Task (RestartOnFailure 5×, 1 min apart). v2.0.0 hardened by Codex 5.5 2026-05-11. Pythonw stdout-handle crash fixed by the inbox_server_launcher.py wrapper.

Cloudflare Tunnel transport (Phase 7 — 2026-05-17)

LIVE

Cut the dashboard→FastAPI path from Coolify reverse-proxy + private Tailnet (4 hops, 502'ing under load) to Cloudflare Tunnel direct (2 hops, encrypted). New public hostname inbox.nutritionalproducts.org/jobs. Auth model: cross-origin POST + x-api-token header + 2nd-chance source classification (loopback / Tailnet sources still allowed for backward-compat). Acceptance battery passed 3/4 (the 1 "fail" is /health design choice); real creative-review round-trip 513 KB + ft-tasks Pattern A round-trip both green; Playwright sweep 11/11 clean. 4 Codex passes · 11 HIGHs fixed inline · secrets-never-to-chat-or-git discipline banked in memory.

Shared widget + transport client

LIVE

_shared/feedback-widget.js renders the canonical 4-part feedback UI (3-state preset buttons + auto-saving free-text per decision + submit bar + localStorage queue fallback). _shared/inbox-client.js is the transport layer — pushToInbox() POSTs the spec-compliant decisions envelope cross-origin with x-api-token. Both shared so every new dashboard inherits transport + auth + cache-bust convention (?v=phase7-YYYYMMDD) for free.

Dashboard SOP — Build a New Dashboard

CANONICAL v2.1

The one canonical reference for any new feedback-collecting dashboard. Trigger phrases: "Build me a new dashboard" · "dashboard SOP" · "add feedback". Markdown SOP at reference/it-sops/dashboard-sop.md (renamed 2026-05-17 from dashboard-feedback-mechanism.md for discoverability). Includes the copy-paste starter template (§11), full procedure (§3-7), build checklist (§8), worked examples (qc-v4-sweep / ft-tasks / command-centre-v3 inline-quick-reply variant), and a Phase 7 changelog. The transport + UI primitives are fixed; only the per-dashboard cards: [...] config changes.

Transparent record of every non-obvious choice made during the build. Calum signed off mid-build and gave the build full autonomy. This log is what was decided, what was rejected, and why — so future-Claude (or future-Calum) can read the trail and either trust it or reverse it cheaply.

2026-05-17 · SHIPPED Phase 7 — Inbox transport: Cloudflare Tunnel direct (replacing Coolify proxy chain)
2026-05-16 to 2026-05-17 · session 49B · 4 Codex passes · 11 HIGHs fixed inline

CHOSE Cloudflare Tunnel direct to FastAPI on localhost:7392. New public hostname inbox.nutritionalproducts.org/jobs; cross-origin POST with x-api-token header (defense-in-depth, real auth = parent basic-auth + CORS allowlist + envelope validator + rate-limit + idempotency); 2nd-chance source classification so loopback + Tailnet sources still work (backward-compat). 2 hops instead of 4.

REJECTED Debug the Coolify 502s in place. 4-hop chain (CF WAF → Traefik → Tailnet → FastAPI) is fragile by design — restart-fixes-temporarily symptom + GET works/POST doesn't was unfixable without architectural change.

REJECTED v1 brief: CF Tunnel + reuse Coolify basic-auth on the browser fetch. Codex pass 1 HIGH — basic-auth header leaks admin:correct-horse… across every dashboard. x-api-token is the correct primitive.

REJECTED v2 brief: install token printed to chat. Codex pass 2 HIGH 2 — chat token persists in .jsonl session log indefinitely. Must be ACL-restricted file with 10-min auto-delete, never to chat.

REJECTED v3 brief: _writes.json carries install token. Codex pass 3 HIGH 1 — _writes.json goes to git. Token NEVER persists there. IDs only by design.

Why Phase 7 wins: the 502s are gone (proven live), the auth model is layered (5 controls behind x-api-token), the install-token handoff is forensically defensible (4 Codex passes worth of paranoia banked into the executor), and rollback is ~60s (revert + Coolify route still wired for 7 days). Acceptance: 4-test battery green · real creative-review round-trip 513 KB envelope · ft-tasks Pattern A round-trip · Playwright sweep 11/11 clean · canonical-sync BackgroundTask still fires for the creative-review review.json git push. SOP at apps.nutritionalproducts.org/it-sop/inbox-cloudflare-tunnel/. Memory: feedback_inbox_cloudflare_tunnel.md.

2026-05-17 · SHIPPED Dashboard SOP rename for discoverability — dashboard-feedback-mechanism.mddashboard-sop.md
2026-05-17 · session 49B · Calum directive

CHOSE Rename the canonical SOP file so the filename matches Calum's natural trigger phrases ("Build me a new dashboard" / "dashboard SOP"). H1 updated to "Dashboard SOP — Build a New Dashboard". 8 current cross-references updated; historical refs in deployment-log + session files + bank logs left intentionally as-is (preserves timeline).

REJECTED Keep filename, add alias only. Calum said "you should call it [X]". A filename change makes the trigger phrase grep-able + tab-completable + URL-able, not just searchable in body text.

Why: the SOP exists to be FOUND. Discoverability beats stability when the file is internal-only. Cross-reference updates are mechanical (~10 places), historical references stay accurate as historical records. Now any future session typing "dashboard SOP" lands on the right file via either filename grep or trigger-phrase header match.

2026-05-13 · SHIPPED + IN-FLIGHT Tool Map → Master Sheet sync (Strategic option)
2026-05-13 · session 49 thread e1459f57-… · 3 options evaluated

CHOSE Strategic option: extend tools/sync_registries_to_master_sheet.py with a 3rd derived-mirror tab for reference/tool-map.md; add cross-links to all 3 Sheet tabs in the access-registry dashboard; resolve Stage 9 preflight FAIL via the existing (parallel-session-built) check-tool-map.py validator.

REJECTED Tactical — refresh existing tabs only. Doesn't surface tool-map; future drift inevitable.

REJECTED Tactical+ — one-off populate Tool Map tab manually. Creates drift risk; no sync script means manual maintenance every time the markdown changes.

Why Strategic wins: the cost of the 3rd-tab sync parser (~30 min) is amortized over every future tool-map edit. The two existing registry tabs were already on the same one-script pattern; staying consistent beats one-off scripts. Drift validator (check-tool-map.py) catches: MCP servers in .mcp.json not in tool-map / slash commands missing / skill-count divergence / unregistered hooks / specialist tool gaps. Now Stage 9 of preflight reports clean (3 informational WARNs only). Full executive brief: apps.nutritionalproducts.org/tool-map-sync/ — covers decision tree, what shipped, conclusions, and the still-in-flight ghost-dir source-ID workstream.

Mid-task discovery: check-tool-map.py was reported missing by Stage 9 but had already been built by the parallel session-49 thread yesterday (323 lines, working). Stage 9 FAIL was from a stale snapshot. Verified rather than rebuilt — saved ~30 min. Reinforces the "grep before claiming absence" memory rule.

v0.3.0 · CHOSE Hybrid Session Architecture — Option B (Full Hybrid)
2026-05-06 · Agentic OS migration · 3 options evaluated

CHOSE Option B: Full Hybrid. Numbered sessions as backbone + daily digest as complementary layer. Sessions remain the primary unit of work; daily digests provide the cross-session memory that Agentic OS's daily memory pattern offers. SOUL.md and USER.md adopted for progressive disclosure.

REJECTED Option A: Bolt-on. Too minimal — just adding SOUL.md/USER.md without rethinking the session model would have been cosmetic. No daily memory, no heartbeat, no wrap-up. Would have left the proven Brainzyme system unchanged but missed the architectural improvements.

REJECTED Option C: SQLite-backed sessions. Over-engineering. The markdown session files work. Adding a database layer would introduce complexity (schema migrations, query tooling, backup strategy) for marginal benefit. The hybrid model gets the architectural wins without the infrastructure burden.

Why B wins: Codex 5.5, Gemini 2.5 Pro, and market research all converged on Option B independently. The daily digest pattern from Agentic OS solves the real gap (cross-session memory decay) without replacing the proven numbered session system that's been working since April 2026.

v0.3.0 · CHOSE Cherry-Pick Not Pull — upstream monitoring strategy
2026-05-06 · Agentic OS upstream relationship

CHOSE Monitor Simon's upstream repo and cherry-pick useful changes selectively. Each cherry-pick is logged with rationale.

REJECTED Running update.sh from Simon's repo. It overwrites CLAUDE.md, skills, and config — would destroy every Brainzyme customisation on every update.

Why: Eliminates overwrite risk entirely. Gives full ownership of the hybrid system. The upstream repo is a source of ideas and patterns, not a dependency to track. If Simon ships a breakthrough feature, we cherry-pick the concept and implement it within the Brainzyme session model.

v0.3.0 · CHOSE Single Folder Consolidation — F:/Agentic-OS/ as sole project root
2026-05-06 · folder architecture

CHOSE F:/Agentic-OS/ as the sole active project root. All tools, config, memory, skills, and bin scripts live here.

REJECTED Running two folders (F:/Claude Root/ + F:/Agentic-OS/) in parallel. Dual roots create drift, confusion about which CLAUDE.md governs, and duplicate tool paths.

Why: F:/Claude Root/ becomes a read-only legacy archive. All new work happens in F:/Agentic-OS/. Clean break, no ambiguity about which folder is authoritative. Migration is one-time; references to F:/Claude Root/ in memory files are updated incrementally as files are touched.

v0.3.0 · CHOSE Command Centre Strategy — Brainzyme dashboard primary, Simon's CC secondary
2026-05-06 · dashboard architecture

CHOSE Keep the Brainzyme static dashboard (apps.nutritionalproducts.org) as the primary Command Centre. Simon's Next.js CC available as a secondary reference implementation.

REJECTED Replacing the Brainzyme dashboard with Simon's Next.js Command Centre. Windows compatibility risk (Node.js PATH issues, native deps), no session management UI, would lose the existing Kanban sync + creative library + audit tabs.

DEFERRED KV relay for dashboard decisions → local execution (Cloudflare KV as the bridge between dashboard UI clicks and local Claude Code execution).

Why: The static dashboard is proven, zero-dependency (just HTML/JS on GitHub Pages), and already has all the Brainzyme-specific tabs (creative library, Google Ads audit, on-page SEO, Kanban). Simon's CC is a React/Next.js app that would require hosting, env setup, and adaptation. The dashboard works. Don't fix what isn't broken.

v0.2.1 · CHOSE MCP context-bloat prune — ~80–125K tokens reclaimed
2026-05-02 · per Calum's "500-800K uncached tokens per turn" diagnosis

Diagnosed: session-start cost was 185–255K tokens (MCP tool schemas dominate). Removed from .mcp.json: gdrive (legacy, superseded by tools/bns_*.py), pinecone-b (rarely-queried secondary), stitch (Google web design MCP, not in active use). Calum manually disconnected 5 Claude.ai connectors (GitHub / ClickUp / Gmail / Calendar / Drive) at claude.ai → Settings → Connectors. Chrome extension "Claude in Chrome" disabled — was the single biggest MCP overhead (~30–45K tokens for 18 browser tools).

Why this beats DeepSeek v4 swap: the per-turn cost driver is context size, not which model is processing it. Cutting 80–125K tokens of permanent session overhead applies on every fresh thread (cold-start cost during bank+restart cycle). Banked decision record: memory/feedback_deepseek_v4_evaluation_2026_05_02.md documents why DeepSeek doesn't apply (subscription billing, not pay-per-token; data sovereignty). Lessons file: claude-setup/it-handbook/lessons-2026-05-02-mcp-context-pruning.md.

Realistic baseline post-prune: ~100–145K tokens at session start (down from 185–255K). 50K target unrealistic with active dataforseo + google-ads + occasional pinecone (those three alone are 60–100K). Activates next Antigravity restart.

v0.2.0 · CHOSE /bank-full — multi-destination bank with mandatory Codex 5.5 second-opinion
2026-05-02 · per Calum's "/bank is too narrow" feedback

Default /bank stays narrow (session file only) for high-frequency use. New /bank-full handles cross-cutting banks: classifies content (canonical-update / sop-revision / config-change / feature-shipped / intel-finding / etc.), proposes routing across a 14-row destination matrix (session file + Pinecone + Graphiti + new+existing feedback files + MEMORY.md + pending_tasks_registry + main+scoped CLAUDE.md + SOP files + backend registry + workstream changelog + hooks), runs HARDCODED Codex 5.5 review of the routing plan via codex_qc.py --mode decision, presents combined Claude+Codex plan as a checkbox checklist, and executes only what user confirms.

Why hardcoded Codex review (not optional): the value is the second-opinion BEFORE writing to multiple destinations. If Codex review is opt-in, it gets skipped under load. If hardcoded with a graceful "WARN if Codex errored" path, the second pair of eyes is always there OR clearly visible-as-skipped.

v0.2.0 · CHOSE Brainzyme Wiki — local Graphiti temporal knowledge graph
2026-05-01 · per Calum's "I want a wiki for evolving SOPs/facts"

Neo4j Desktop (local, free) + graphiti-core v0.29.0 + OpenAI for entity extraction. Group ID: brainzyme-wiki. Initial seed of 27 episodes (all memory/CANONICAL_*.md + memory/feedback_*_canonical.md + brands.json + matrix _meta + 4 key SOPs) yielded 178 entities + 415 RELATES_TO edges.

Three Python wrappers in tools/: graphiti_setup.py (initial seed), graphiti_sync.py (ongoing, mtime-tracked), graphiti_query.py (one-shot or REPL hybrid search).

21-test SOP-correctness battery results: 18/20 pass at recall@30, 15/20 at recall@15. Honest: Graphiti is strong on canonical structural facts (lexicon, GA4 property IDs, brand portfolio relationships) but weaker on procedural recipes and slash-command syntax. Use it for "what's the current canonical fact?" questions — not as a replacement for markdown SOPs or Pinecone semantic search. Re-evaluate value at 2-3 weeks based on actual usage.

v0.2.0 · CHOSE Multi-LLM Extension — Codex 5.5 as QC peer + Gemini for multimodal
2026-05-01 · in response to Opus 4.6/4.7 regression observed April 2026

Claude leads, Codex 5.5 challenges, user decides. Codex review fires auto on every Write/Edit (PostToolUse hook, deterministic) AND behaviorally on plans / agent prompts / multi-system decisions (CLAUDE.md §10 rule). Gemini handles video / PDF / audio — separately, on-demand, NOT a QC peer.

Codex stack: tools/codex_qc.py (4 modes: code/plan/agent-prompt/decision; stdin prompt to dodge Windows 32K argv limit; quota+auth error classifier; PATH-injection for codex.cmd). Hook at .claude/qc-codex-review.sh. Toggle via /qc-on (default) and /qc-off. Codex's standing context lives in F:\Claude Root\AGENTS.md (slim ~130-line QC brief, NOT a CLAUDE.md duplicate).

Gemini stack: tools/gemini_quick.py (CLI / Flash via OAuth, free tier; for transcripts / single PDF / image QA) + tools/gemini_pro.py (API / 2.5 Pro via Google AI Studio key; supports --youtube URLs and auto-routes to Files API for files >20MB; for frame-by-frame video / large PDF deep reads).

Why Codex over a second Claude: adversarial perspective comes from a different model. The QC framework is genuinely catching bugs — it found 9+ HIGH issues during this very build round (PATH inheritance, backslash-path normalization, hardcoded paths, idempotency bugs, file I/O error gaps, etc.) that an Opus-on-Opus review would more likely miss.

v0.1.4 · CHOSE Add --original heavyweight mode (default stays lightweight)
2026-04-30 · per Calum's question on resume semantics

Calum noticed claude-resume <N> creates a new thread and reads the old one's transcript via Sonnet — wondered if continuing the old thread directly would be better. Answer: both are valid, just for different cases. Now claude-resume <N> --original does the heavyweight resume (looks up most recent UUID via bin/_session_uuid.py and runs claude --resume <UUID>) so you're back IN the old thread with the full transcript loaded.

Why default stays lightweight: the /bank + /clear + claude-resume playbook exists precisely to dump bloated context and start fresh. If we made heavyweight the default, every post-bank resume would re-inherit the 500K context we just dumped — defeats the entire cost-saving purpose. Lightweight is the right default for the playbook; heavyweight is the right opt-in for crash recovery / quick breaks where you want zero state loss.

v0.1.3 · CHOSE Wrapper boots Opus 4.6 (not 4.7) by default
2026-04-30 · per Calum's direction

The claude-resume wrapper now passes --model claude-opus-4-6, overriding settings.local.json's claude-opus-4-7 default at the wrapper level only. Manual claude launches (not via wrapper) still get Opus 4.7.

Why: the resume cycle is mechanical (read bank log → dispatch sub-agent → present brief → wait for confirm) and doesn't need 4.7's deeper reasoning. Opus 4.6 is the only Opus tier that supports /fast mode for faster output, which is useful during mechanical work. Opus quality is preserved (no Sonnet downgrade); just a one-tier shift.

v0.1.2 · VERIFIED First production run of /resume <N> end-to-end
2026-04-29 · session 49 · thread b5b84157…

Calum invoked /resume 49 from a live thread. Run completed every step: session file located, frontmatter parsed, current thread UUID resolved (via mtime), UUID appended to continuationSessionIds via Edit, Sonnet subagent dispatched + structured catch-up brief returned, dual-view (brief + session file body) presented, confirm gate held until Calum said "yes". This is the first end-to-end production proof that the slash-command path works.

Hook anomaly logged: the PostToolUse thread-register hook did NOT fire on the session-file Edit — F:/Claude Root/.claude/session-thread-register.log does not exist, and the hook script writes a log line on every invocation. Hook IS registered in settings.local.json:288–298. Cause not yet diagnosed; possible candidates: Antigravity not loading settings.local.json on startup; cat of $CLAUDE_TOOL_INPUT reading nothing; permissions/exec issue. Workaround: always pass explicit N to /bank and /resume until hook is fixed (autodetect/empty-N path depends on the hook having stamped this thread's UUID at least once).

Still untested in production: claude-resume <N> cold-boot wrapper (this thread's launch path unknown), /bank end-to-end, 1M context beta.

v0.1.1 · CHOSE Make N optional in /bank and /resume via thread-UUID autodetect
2026-04-25 · UX fix from Calum's first-read review

v0.1 required /bank N and /resume N. Calum pointed out the friction: in 95% of cases you're already working on session X and want to bank/resume to session X — typing N is friction.

Both slash commands now branch: if N is provided, use it (with mismatch warning if it doesn't match autodetected current session). If N is omitted, resolve current thread UUID via mtime, grep session files for that UUID, and use the match. Multiple matches → ask. Zero matches → tell user to register the thread first or pass N explicitly.

Why: the thread-register hook already maintains a perfect lookup table (every session file's continuationSessionIds contains every thread UUID that's ever touched it). One grep gives us the authoritative answer. Cold-boot claude-resume N still requires N — there's no thread context yet. The override path stays useful for the rare case of banking to a different session.

CHOSE --effort max instead of "ultrathink" magic word
2026-04-24 · Resume wrapper · Settings defaults

Claude Code 2026 exposes --effort <low|medium|high|xhigh|max> as a first-class flag. "ultrathink" was the older prompt-body magic-word convention.

Why: Configuration > prompt content. Setting "effort": "max" in settings.local.json makes every launch high-effort by default without polluting prompts. The wrapper passes --effort max redundantly as belt-and-braces.

CHOSE mtime fallback as the PRIMARY thread-UUID resolver
2026-04-24 · /resume slash command Step 3

Probed $CLAUDE_SESSION_ID in this Claude Code version: not exposed to the main process (only inside hooks via JSON stdin). Confirmed via env | grep -i claude.

Why: Spec listed env-var as priority 1 with mtime fallback. Reality required swapping the order. ls -t *.jsonl | head -1 reliably surfaces the current thread because the active .jsonl is constantly being appended to. The hook still gets session_id via JSON stdin, so hook-side registration is unaffected.

DEFERRED 1M context window verification
2026-04-24 · settings.local.json betas key

Set "betas": ["context-1m-2024-08-07"] as best-effort. Did not have time to launch a fresh Claude session and verify the beta is actually being applied on the wire.

Why deferred: If Claude Code doesn't recognise this betas key in settings, it'll be silently ignored — no breakage. The wrapper still works at standard 200K context which is sufficient for most resume workflows. Calum can verify with claude -p "what's your context window?" on next launch and we adjust then. Marked in CLAUDE.md as pending verification.

CHOSE Hook lives flat in .claude/ not in .claude/hooks/
2026-04-24 · Hook script location

Spec proposed .claude/hooks/session-thread-register.sh. The existing 7 production hooks (banned-copy-gate.sh, shopify-drift-check.sh, etc.) all live flat in .claude/.

Why: Match existing convention. Splitting hooks into a sub-folder for one new file would create inconsistency without benefit. If the count grows past 12-ish, revisit.

CHOSE Hook uses CLAUDE_TOOL_INPUT="$(cat)" bash … wrapper, not stdin
2026-04-24 · Hook registration in settings.local.json

Existing project hooks (Shopify drift check, banned-copy gate) wrap the bash command with CLAUDE_TOOL_INPUT="$(cat)" bash … so the script reads input from an env var instead of stdin.

Why: Convention consistency. Also makes the script independently testable (you can export CLAUDE_TOOL_INPUT='{"…"}' and run the script directly to debug). The new script reads ${CLAUDE_TOOL_INPUT:-$(cat)} so it works either way.

CHOSE Read FULL session file body, not just frontmatter
2026-04-24 · /resume Step 2 · spec patch round 1

Initial spec said "extract from frontmatter." Calum caught a gap mid-conversation: the session file body has the actual Pending / Next-action / NEXT-SESSION RESUME sections that he maintains by hand.

Why: Frontmatter alone is the audit trail; body is where state actually lives. Spec + plan patched in real-time before the build started. The /resume Step 7 now displays both Sonnet's transcript-derived brief AND the session file's body sections — divergences between the two are useful signal.

CHOSE Append-only Bank log, no overwrites
2026-04-24 · /bank command Step 4

Bank entries accrete. Old entries are never modified. The most recent entry plus a top-of-file pointer carries the "current state" load.

Why: History is searchable and cheap. Overwriting loses the trail of how we got here, which is exactly the signal you want when a decision turns out to be wrong. The top-of-file pointer (## NEXT-SESSION RESUME + frontmatter last_touch) is what gets refreshed each time, so readers see fresh state without scrolling.

CHOSE Bank guards against empty banks
2026-04-24 · /bank command Step 2

If the thread has done no work (zero decisions, zero files touched, zero open loops), /bank refuses politely and exits without writing.

Why: Empty banks pollute the log with no-signal noise and dilute the value of meaningful banks. Better to fail loudly than write a vacuous entry. If you really want to bank "I did nothing", you can manually add a note.

REJECTED Auto-bank on context threshold
2026-04-24 · v1 scope decision

Considered: a hook or background process that detects when context crosses 60% / 80% and auto-fires /bank.

Why rejected: Surprising side effects. The user should be in control of when state gets written. Also, Claude Code doesn't reliably expose context-usage to hooks in this version — implementation would require polling or hacks. Manual /bank with a status-line warning (deferred to v1.1) is a cleaner path.

REJECTED Combine /bank and /clear into one /bank-and-clear command
2026-04-24 · /bank command scope

Considered: /bank-and-clear <N> as a one-shot.

Why rejected: /bank is reusable at any time — at end of milestones, before tool-heavy work, just to checkpoint. Coupling it to clearing locks it to one use case. The two-step flow (/bank then /clear) keeps the building blocks atomic. If a one-shot becomes a strong pattern, add it as a v1.1 alias.

REJECTED Native --resume <uuid> for Brainzyme session resume
2026-04-24 · Architecture review

Claude Code has built-in claude --resume <uuid> that literally continues a prior thread (loads its full history into a new process).

Why rejected for our use case: Native resume reloads the FULL prior context — exactly what we're trying to escape when the cycle is "context too high." Our flow deliberately starts a new thread with a small context, using Sonnet to summarise the old one. Native --resume is the right tool for "Claude crashed mid-task and I want to literally continue" — different problem.

CHOSE Three top-of-MEMORY.md positioning for the Claude OS section
2026-04-24 · MEMORY.md index

Added a dedicated ## Claude OS — Read for any session-management question section near the top of MEMORY.md, just after the User section and before Feedback & Behaviour.

Why: High-frequency reference. Future Claude sessions will hit session-management questions on every cold boot — putting it in Feedback & Behaviour buries it. Top-positioning makes the index do its job: surface the right doc fast.

"claude-resume : The term 'claude-resume' is not recognized..."

The bin folder isn't on your user PATH. Run the one-time PowerShell setup from the Quick Start tab (top of page), then close and reopen the terminal. The shell wrapper file at F:\Agentic-OS\bin\claude-resume.cmd exists fine — it just can't be found by name without PATH.

"Unknown slash command" when I type /resume or /bank

The slash command file may not be where Claude Code looks. Check that F:/Agentic-OS/.claude/commands/resume.md and F:/Agentic-OS/.claude/commands/bank.md exist and that you launched Claude from the project root (F:/Agentic-OS/).

If the commands directory was created mid-session, restart Claude Code — slash commands are scanned at startup.

Resume gives me an empty Sonnet brief

Likely the primary .jsonl doesn't have what we expected at the tail. The /resume spec says: widen budget once and retry, then report honestly. If that still gives nothing, fall back to reading the session file body directly — your manual notes are authoritative.

Worth checking: ls -la "C:/Users/PC/.claude/projects/F--Claude-Root/<UUID>.jsonl" — if size is suspiciously small, the thread may have been truncated.

The hook isn't appending UUIDs to my session file

Check F:/Agentic-OS/.claude/session-thread-register.log for entries when you edit a session file. If it's silent:

1. Confirm the hook is registered in .claude/settings.local.json under hooks.PostToolUse with matcher Write|Edit.
2. Confirm the script is executable: ls -la "F:/Agentic-OS/.claude/session-thread-register.sh" should show an x permission.
3. Try a no-op edit (add then remove a space) to a session file — the hook should fire and log.

claude-resume launches Claude but the slash command doesn't fire

The wrapper calls claude --effort max "/resume N" — passing the prompt as a positional argument. If your installed Claude Code version handles positional prompts differently, the wrapper may need adjusting.

Quick fix: launch Claude manually then type /resume N. If that works, the wrapper's prompt-passing is the issue. Update F:/Agentic-OS/bin/claude-resume.cmd — try piping via stdin instead of positional.

1M context doesn't seem active

The betas setting is best-effort. If the beta header isn't being applied, you're at standard 200K. To verify:

From a fresh Claude session: "what's your context window?" — Claude usually answers honestly.

If 1M isn't active and you need it, ask Anthropic support for the current beta-header name in your account, then update settings.local.json betas array. Or accept 200K and bank more frequently.

I banked but the next thread doesn't see it

Most likely cause: you banked, then continued in the same thread, then ran /resume on the same session number — the resume picked up the conversation's transcript which already contained your live state, so the bank entry looks redundant.

The bank value shows up when you actually /clear (or close + reopen Claude) before resuming. The fresh thread starts blank and the bank entry becomes the primary catch-up source.

Working Principles

4 rules

1 · Delegate to Sonnet for mechanical reads

LIVE

Any .jsonl transcript, any file >500 lines, any bulk search across >5 files — dispatched to a Sonnet subagent, never read into the main Opus context. Documented in CLAUDE.md §3 "Delegation-First Triage" and feedback_delegation_triage.md.

2 · Session file IS your RAM

LIVE

Working-context files under memory/working-context/ are updated at every task milestone — objective, pending items, residual context. After thread death, the file is the primary source of truth; transcripts are secondary. Documented in CLAUDE.md §3 "Working Context Lifecycle."

3 · RAG pre-flight for ambiguous work

LIVE

Before any substantive answer on campaigns, creative, compliance, or any topic matching the trigger list, query the Pinecone claude01-v2 index first. Full SOP in feedback_rag_preflight_mandatory.md.

4 · Single source of truth per domain

LIVE

Product vocabulary lives in brainzyme_product_matrix.json. Creative rules live in creative_vault.json. Shopify publish rules live in the SOP. Claude OS components live here. Memory files point to the canonical source — they don't duplicate content.

Backlog

v1.1 candidates

KV Relay — Dashboard to PC execution relay via Cloudflare KV

PLANNED

Cloudflare KV as the bridge between dashboard UI clicks and local Claude Code execution. Dashboard writes a decision/action to KV; local polling agent picks it up and executes. Enables "click approve on the dashboard, Claude acts locally" without exposing local ports. Deferred from v0.3.0 Command Centre decision.

Status-line context warnings

IDEA

A status-line hook that flags when conversation context crosses configurable thresholds (e.g. 60% / 80%), prompting manual bank + restart before Claude's auto-compaction kicks in and loses fidelity. Depends on whether statusline.sh hooks receive token-count data in this Claude Code version (currently unverified).

Session-ID resolver shim

IDEA

Lightweight tool that reports the current Claude thread UUID without relying on env-var availability or mtime heuristics. Useful for hook portability and for any other tooling that needs to correlate thread state with external systems (Pinecone, ClickUp, dashboards).

Bank diffing

IDEA

/bank-diff <N> shows what's changed since the last bank entry — useful for "what did I actually accomplish this thread" checks before a major commit.

Bank-to-Pinecone sync

IDEA

Ship bank entries to claude01-v2 for semantic recall across sessions. Pairs with the existing Pinecone banking pattern.

What this is. A meta-analysis of the primary Brainzyme ad-ops Claude instance, done by a second observer instance with no business overlap. 23 subagents read all-time sessions + legacy Claude Root memory + raw .jsonl threads. Findings cross-reviewed by Codex 5.5 + Gemini 2.5 Pro.

Headline. The migration to Agentic OS + GSD genuinely fixed most legacy bugs (hook-bypass count went from 12 in 4 days → 0 in 3 days). The 15-specialist-session architecture is correct, not sprawl. Three vestigial issues remain: cross-domain drift, pending-task accretion, and invisible async fan-out density. Recommendation: add a supervisor session, not more rules.

Date. 2026-05-14. Reviewer model: Opus 4.7 (observer instance) + Codex 5.5 + Gemini 2.5 Pro.

Method

23 agents + 2 LLM peer reviews
WaveScopeAgentsWhat it read
1 — Recent sessions2026-04-23 → 2026-05-141016 active session files, pending-tasks-registry (63KB), the four largest (32/38/49/25)
2 — Peer review conveneSynthesis → frameworkCodex 5.5 (decision mode) + Gemini 2.5 Pro (independent)
3 — Historical sessionsPre-2026-04-23 + archived729 older sessions (S14-S55 not in wave 1) + 2 archived
4 — Legacy memoryF:/Claude Root/memory/ (264 files)4MEMORY.md + CANONICAL spines, oldest feedback (Apr 4-15), pre-migration feedback (May), project/reference files
5 — Raw .jsonl threadsVerbatim conversations2Biggest current thread (26MB, 9,407 lines) + oldest surviving (Mar 14)

Calibration — the 15-specialist model

CRITICAL CONTEXT

Calum runs 15 terminal windows in parallel, each session owning one specialist domain. This is not sprawl — it is intentional architecture. Table below is live — fetches sessions.json (the canonical Kanban source).

Update 2026-05-15 — the “one JSON” question now has an answer. Rather than mutating sessions.json with per-session pending_items[] (which would have meant editing the auto-sync hook + risking cross-session breakage), we shipped three new JSONs next to it: pending-triage-report.json, calum-decision-packet.json, fanout-summary.json. All three are read by the new Triage tab. The per-session pending dropdown below still shows COUNT only; the item-level detail lives on the Triage tab. This is the simpler, lower-risk answer.
Loading sessions.json…

Diagnosis recalibration: the issue is not that sessions exist or that they live long. The issue is that some sessions are drifting across their stated specialist boundaries, and some are paused at gates without an escalation reflex. Domain mapping below is Calum's mental model overlay; live status is from sessions.json.

What the migration to Agentic OS + GSD actually fixed

DON'T UNDO THIS
Pre-migration pain (Claude Root, Apr-May)Agentic OS fix — working
12 HOOK_BYPASS invocations in 4 days (May 1-4)0 bypasses in recent 3-day raw thread (e3c57133, 9,407 lines)
34 files with “canonical” in name; precedence warsSoT precedence + _consolidates frontmatter
reference_files.md 58KB hoarding flat list; Graphiti refused to ingestconnections-registry.md + tool-map.md with drift validators
feedback_ga4_canonical.md revised 3x in 5 daysCANONICAL_strategy.md with consolidation discipline
9 tools in March → 67 deferred-tool sprawlTool-map reconciliation rule + Step 0 RAG pre-flight

The 11 patterns — calibrated severity

23-agent consensus
#PatternSeverity nowEvidence
DPending-task graveyardACCELERATINGLegacy 302 lines → current 448 lines (today). Same disease, higher accretion.
ACross-domain drift (not sprawl)RESIDUAL4 of 15 specialist sessions absorbing out-of-domain work (32, 25, 38, 49)
XInvisible async fan-outNEW DIAGNOSIS80 TaskCreate dispatches in 75h via raw thread — invisible at session-file layer
BRecursive self-modificationSLOWINGS49 shipped 7 OS versions in one session; feedback_canonical_aspect_banking.md written into legacy 6d post-migration
IHook-bypass normalizationMOSTLY FIXEDRecent 3-day raw thread: 0 bypasses. Deterministic gates working.
CIdle-time aversionRESIDUALS38B dispatched 3 agents during Workspace block; bank entries' “Next action” never reads “wait”
EDocs-after-tryingRESIDUALS50: 8+ OAuth attempts before reading service contract; Connections Registry exists, reflex patchy
HStrategy never deploysCALUM-GATEDS20 honeypot 5wk pause; S46 SEO 3wk at deploy gate. Calum-blocker, not Claude-blocker.
FStop-loss asymmetryRESIDUAL$ cost tracked cleanly (CPC ceiling reverted at +121% CPA). Hour cost not tracked.
GVolume-as-value hoardingLEGACY ERAS25: 9,850 assets indexed, only 27 actioned. Less central in current work.
JSpecialist humilityCOUNTER-PATTERN (KEEP)S36 (legal), S47 (finance), S37 (IT) — calibrated hedging, no faked expertise
KClean closure when externally boundedCOUNTER-PATTERN (KEEP)S52: 32 files, £7.30 under £8 cap, ENDED. Hardware-limit accepted.

Three concrete fixes — how to go forward

orchestrator pattern

Fix 1 — Pending-task accretion

PROPOSED

Symptom: pending-tasks-registry.md at 448 lines and growing. Add rate > close rate. 22 Calum-gated tasks. The 15-min UI task blocked 3 weeks with no escalation.

Proposal: retire the markdown registry; promote GSD to sole owner.

  • Migration: one-off Codex-reviewed dump of the current 448-line registry into GSD tasks with proper labels (session, age, blocker-type).
  • Max-age cull: any GSD task >21 days with status READY/NOT STARTED/TO BUILD/DEFERRED auto-escalates: (a) close, (b) escalate to Calum with kill/keep ask, or (c) dated trigger. Weekly cron via tools/triage_pending_tasks.py.
  • Calum decision packet: 22 Calum-gated tasks → one weekly digest with default recommendations + a single-click approve/reject. Stops drip-feed blockers.
  • Acceptance criteria mandatory: no “Investigate X” / “Re-evaluate Y” without a measurable done-condition.
  • Vague-verb ban: reject task creation if subject starts with Investigate / Re-evaluate / Observe / Consider.

Why this works: the registry is currently doing two jobs (queue + audit log). GSD does the queue better; banks + memory do audit better. The markdown is duplication.

Fix 2 — Cross-domain drift (the real “sprawl” problem)

PROPOSED

Symptom: 4 of 15 specialist sessions are absorbing out-of-domain work. S32 (agent build) absorbed registry + canonical messaging. S25 (MarCom) absorbed cascade + dashboards + master-sheet sync. S38 (Shopify) ran 7 article variants in parallel.

Proposal: add a 16th “supervisor” session that routes incoming work to the correct specialist.

  • Supervisor session role: ingests Calum's new requests, reads the 15 specialist frontmatters (each declares its domain), routes the request to the right worker via GSD task with explicit owner.
  • Domain frontmatter (canonical): every worker session adds domain: + refuses_outside_domain: true to frontmatter. When a worker is asked to do off-domain work, it MUST bounce back to the supervisor.
  • Cross-session conflict detection: shared status file F:/Agentic-OS/.claude/multi-session-status.jsonl — every worker appends a status line on /bank with files-touched. Supervisor watches for collisions.
  • Domain drift alert: if a worker session bank entry mentions a file outside its domain’s expected file globs, supervisor flags it for Calum review.
  • Absorption discipline: declarative absorbed_sessions: [N, M] frontmatter requires same-day matching absorbed_into. No silent absorption.

Why this works: the 15-specialist model is the right architecture. The missing piece is a coordinator. Currently the “coordinator” is Calum running 15 windows; the supervisor session relieves that bottleneck.

Fix 3 — Invisible async fan-out density

PROPOSED

Symptom: raw .jsonl thread e3c57133 showed 80 TaskCreate dispatches in 75 hours (~1 every 55 min). Session files would say “dispatched a few agents.” The orchestration density is hidden at the layer humans read.

Proposal: instrument the fan-out + impose a daily dispatch budget per session.

  • Dispatch counter: PostToolUse hook on TaskCreate writes a line to .claude/fanout-log.jsonl with {session, ts, task_subject, parent_session}.
  • Per-session daily budget: default 20 dispatches/day/session. Hard-stops at 30 with an explicit Calum override required.
  • Density gauge in dashboard: new Claude OS tab section showing per-session dispatches over time. Visible orchestration density.
  • Cache-hit alarm: add a hook that surfaces “you've been in this thread for >72h” nudge to /bank + start a fresh thread.
  • Plan-check before fan-out >5: if a turn is about to dispatch >5 subagents in parallel, require an in-turn codex_qc.py --mode agent-prompt review first.

Why this works: “you can't manage what you don't measure.” The dispatch volume is real production work, but invisible. Surfacing it lets Calum see when a session is fan-out-bombing.

Bonus — Block discipline (addresses Pattern C)

PROPOSED

Gemini 2.5 Pro’s “Time-Out-On-Block (TOOB)” protocol. When a task is blocked on Calum >24h, the only valid next action is STATUS: BLOCKED_ESCALATED — no adjacent exploration.

Currently S38B and S47 are routing around Calum-blocks by generating execution theatre. The instance must become comfortable with waiting. The supervisor session enforces this.

Implementation order

smallest first
SHIPPEDWeek 1 — Instrumentation only (zero behaviour change)
2026-05-15 · effort: 1 day · commit 3a01ab62 + Agentic-OS feature/v4-migration

PostToolUse hook (.claude/hooks/fanout-counter.js) on TaskCreate live. tools/fanout_summary.py aggregates into the dashboard JSON. Read-only. Baseline forms over the next 7 days. View live density →

SHIPPEDBONUS — Option C pending-task discipline scripts
2026-05-15 · effort: 1 day · shipped alongside Week 1 because they pair naturally

Three read-only scripts on the existing markdown registry: triage_pending_tasks.py, calum_decision_packet.py, check_pending_task.py. New Triage tab in the pipeline kanban renders all three. No migration, no in-session disruption. View Triage tab →

Week 2 — Domain frontmatter + status file
~2026-05-22 → 2026-05-28 · effort: 2 days · awaits Calum verdict on FIX 2a card above

Add domain: + refuses_outside_domain: frontmatter to all 15 active sessions. Ship the shared multi-session-status.jsonl with /bank hook integration. Still no supervisor — just visibility.

Week 3 — Supervisor session pilot
~2026-05-29 → 2026-06-04 · effort: 3 days · awaits Calum verdict on FIX 2 card above

Spin up Session 60 (or rename one MISK) as the supervisor. Specialist domain: routing. It reads multi-session-status.jsonl + GSD + frontmatters, surfaces drift, suggests handoffs. Calum still does final routing — supervisor just recommends.

Week 4 — Triage baseline review
~2026-06-05 → 2026-06-11 · effort: half day

Re-open the Triage tab and check the +14d success criteria: stale count low, Calum-blocked dropped, missing-acceptance trending down. If not trending well, that's the trigger to revisit Fix 1 with a more aggressive option (A: build cross-session task store, or B: external system like Linear).

Week 5 — Block discipline + dispatch budgets
~2026-06-12 → 2026-06-18 · effort: 2 days · awaits Calum verdict on TOOB card above

Add TOOB enforcement: 24h block timer. Add the 20-dispatch-per-session daily budget. Both have soft warnings first, hard stops only after 2 weeks of fanout-summary baseline.

Industry positioning — Karpathy + Top-15 + your 3-LLM panel

more considered opinion

Karpathy — what's actually portable

Most of his published work is conceptually clarifying but not directly liftable. The one genuinely portable artefact: karpathy/autoresearch (March 2026) — frozen substrate + one mutable file + markdown prompt + scalar score. Loop: edit → 5-min run → score → keep/discard. 700 experiments / 2 days / 20 wins. This pattern maps onto your .claude/skills/ self-improvement directly — your meta-skill-creator already has the eval harness; what's missing is the overnight hill-climbing loop running it.

He also runs “tmux grids of agents with watcher scripts” (Fortune, March 2026) — that's literally your 15-session setup, but he's brute-forcing it without architecture. Your Command Centre + session-thread-register is ahead of where Karpathy is publicly. The LLM Wiki gist + Software 3.0 framing are useful vocabulary, not installable code.

Top 15 industry strategies — your top 3 highest-leverage

#TechniqueWhy this oneReference
1Eval-Driven Development (Braintrust pattern)You have 15 specialist sessions but no eval harness on their outputs. Add Braintrust-style eval set per session — compliance pass-rate, copy-quality vs canonical, schema validity. Every CLAUDE.md change becomes measured, not guessed. Highest ROI move.braintrust.dev
2Sleep-Time Compute (Letta)You already have inbox/cron. Add nightly job per session that re-reads day's transcripts, generates Reflexion-style reflections, pre-computes likely RAG queries. +18% accuracy, 2.5× cost reduction reported.arxiv 2504.13171
3Multi-Agent Peer Review with Reliability Weighting (A-HMAD)You already invoke 3 LLMs. Upgrade by tracking each reviewer's historical agreement-with-Calum per task class (compliance / code / copy / strategy) and weighting votes accordingly. +4-6% accuracy lift reported.ACL 2025
HMVoyager-style auto-skill libraryYour .claude/skills/ is one cron away from auto-extending when a pattern repeats 3+ times. Honourable mention.arxiv 2305.16291

Your 3-LLM convene approach — positioning

Your Opus + Codex 5.5 + Gemini 2.5 Pro pattern is the canonical “LLM Jury” (Cohere PoLL) + “Mixture of Agents” (Together AI, ICLR 2025 Spotlight) structure. 3 independent model families (Anthropic / OpenAI / Google), adversarial role assignment, verbatim surfacing — you nailed the structure. References: MoA, PoLL.

What you're doing better than research:

  • 3 independent model families (most teams use 3 variants of one)
  • Deterministic hook trigger on Write/Edit (most teams run QC manually)
  • feedback_fix_codex_highs_dont_halt discipline — research finds many panels halt and lose velocity
  • Codex sees only the artefact, not Claude's reasoning chain (avoids contagion on first reviewer)
  • Final actor preserved as Claude — most papers debate this; you've decided

Three gaps research says to close:

  • Differentiate prompts by role, not just model. Today Codex and Gemini both run “find issues” framing. Upgrade: Codex = adversarial correctness auditor; Gemini = independent strategic reviewer who receives the raw artefact ONLY (no Opus reasoning chain — prevents contagion). Codifies the Cohere PoLL “disjoint perspective” finding.
  • 30-50 item calibration set at tools/qc_calibration/ with known-correct judgements. Run quarterly through the panel. Without it, panel agreement = unmeasured confidence. With it, weight each reviewer per domain.
  • Explicit aggregation with disagreement-as-signal. Today HIGH-from-any-reviewer triggers action. Upgrade: 2-of-3 → BLOCK, 1-of-3 → SURFACE to Calum (not auto-block). Pattern: arxiv 2510.01499 — Optimal Weight.

Develop these — your decisions

Loading…

Each card has 3-state buttons (click once = selected, twice = confirmed, thrice = clear) and an auto-saving feedback textarea. Use the Send Feedback bar at the bottom of the page to push your decisions to the Inbox — matches the canonical pattern used across all dashboards (_shared/feedback-widget.js).

2 of 12 already shipped (2026-05-15). Fix 1 shipped as Option C (read-only discipline scripts on the existing markdown registry, not a destructive migration). Fix 3 shipped as Week 1 fan-out instrumentation. Both render as BANKED below and link to the live Triage tab. The other 10 cards are open for verdict + feedback.
FIX 1 SHIPPED 2026-05-15 Pending-task discipline (Option C)
Shipped as Option C, not the original migration plan. After honest risk analysis, the aggressive registry-retirement was net-positive but not pure upside (29 files reference the registry; 4 core skills consume it; risky migration window). Instead shipped 3 read-only discipline scripts on the existing markdown: triage_pending_tasks.py, calum_decision_packet.py, check_pending_task.py. New Triage tab in pipeline kanban renders all three as JSON. Open Triage tab →
SHIPPED 2026-05-15 · commit 3a01ab62 (brainzyme-git) + Agentic-OS feature/v4-migration · reversible by deleting 3 scripts
FIX 2Add a 16th “supervisor” session for routing + drift detection
Reads the 15 specialist frontmatters + GSD + a shared multi-session-status.jsonl. Routes new requests to the right worker, flags cross-domain drift (S32 doing Shopify, S25 doing cascade infra). Supervisor recommends; Calum still routes final.
effort: 3 days · new session number (60? or repurpose MISK 52) · depends on: domain frontmatter rolled out first
FIX 2aDomain frontmatter + multi-session-status.jsonl
Each session adds domain: + refuses_outside_domain: true to frontmatter. /bank hook appends to shared multi-session-status.jsonl with files-touched. Supervisor reads this. Foundation for FIX 2.
effort: 2 days · all 15 sessions need a one-line frontmatter add · ship before supervisor pilot
FIX 3 SHIPPED 2026-05-15 Instrument async fan-out (Week 1)
Shipped Week 1 instrumentation only. .claude/hooks/fanout-counter.js PostToolUse hook on TaskCreate appends per-dispatch records to .claude/fanout-log.jsonl. tools/fanout_summary.py aggregates into the dashboard JSON. Read-only / zero behaviour change. Budgets (20/day soft, 30/day hard) deferred to Week 5 after baseline established. Open Triage tab → (Section 3 = fan-out density).
SHIPPED 2026-05-15 · data populates as sessions dispatch · baseline review at +14 days
BONUSTime-Out-On-Block (TOOB) discipline
Task blocked on Calum >24h → only valid next action is STATUS: BLOCKED_ESCALATED. No adjacent exploration. Currently S38B and S47 are routing around Calum-blocks with execution theatre.
effort: 2 days · soft warnings first · hard stops after 2-week baseline

From Karpathy (one item)

KARPATHYAutoresearch loop for skill self-improvement
Adapt Karpathy's autoresearch pattern (frozen substrate + 1 mutable file + scalar score) to .claude/skills/. Overnight loop: pick a skill, propose a SKILL.md edit, run eval set, keep if score up. Requires the eval harness from item below.
effort: 3-4 days · depends on Eval-Driven Development · high upside, untested in this domain

From the Top-15 industry techniques

TOP-15 #1Eval-Driven Development per session
Build Braintrust-style eval set per session: compliance pass-rate, copy-quality vs canonical, schema validity, deploy success rate. Every CLAUDE.md / skill change is then measured, not guessed. Highest ROI move; everything else is downstream.
effort: 3 days · start with 3 highest-stakes sessions (32 / 38 / 46) · pre-req for Karpathy autoresearch loop
TOP-15 #2Sleep-Time Compute (overnight reflection per session)
Nightly cron per session: re-reads day's transcripts, generates Reflexion-style failure analysis to context/learnings.md, pre-computes likely next-day Pinecone queries. Letta reports +18% accuracy, 2.5× cost reduction.
effort: 2 days · uses existing inbox/cron infrastructure · runs while Calum sleeps
TOP-15 HMVoyager-style auto-skill promotion
When a pattern is observed 3+ times across sessions (same tool sequence, same prompt shape), auto-draft a skill at .claude/skills/_draft/. Calum reviews + promotes. Your skill folder becomes self-extending.
effort: 2 days · lower urgency — honourable mention · honestly might create skill-bloat

3-LLM panel upgrades (close research-identified gaps)

PANELDifferentiate prompts by reviewer role
Codex = adversarial correctness auditor (security, code bugs, type safety). Gemini = independent strategic reviewer who receives the raw artefact ONLY (no Opus reasoning chain — blocks contagion). Codify in tools/codex_qc.py and tools/gemini_pro.py wrapper prompts.
effort: 1 day · closes the contagion risk identified in the dream-review's own Gemini call
PANEL30-50 item calibration set at tools/qc_calibration/
Build known-correct judgement set: 10 real Liquid bugs, 5 false positives, 5 compliance fails, 5 valid overrides, 5 strategic blunders, etc. Run quarterly through the panel to measure each reviewer's precision/recall + detect drift. Derives weights per reviewer per domain.
effort: 2 days curation + 1 day automation · without this, panel agreement = unmeasured confidence
PANELAggregation step with disagreement-as-signal
Today HIGH-from-any-reviewer triggers action. Upgrade: 2-of-3 flag HIGH on same axis → BLOCK; 1-of-3 flag HIGH and other 2 silent → SURFACE for Calum (not auto-block, not auto-pass). Median-pool for severity, max-pool for hard-banned categories.
effort: 1 day · requires calibration set first to set weights · reduces over-blocking

Artifacts referenced

audit trail
PathWhat
F:\Agentic-OS\tmp\dream-review-synthesis-2026-05-14.mdSynthesis sent to Codex + Gemini
F:\Agentic-OS\context\sessions\session-21-cleanup-and-planning.mdLargest historical session — identified as “the genome” for 9 of 11 patterns
F:\Claude Root\memory\feedback_behaviour.mdOldest feedback file (Mar 15) — Calum’s 7 founding complaints
F:\Claude Root\sessions\67218b35-*.jsonlOldest raw thread (Mar 14) — original sin: ended with Claude symlinking its own filesystem