The hybrid AI-infrastructure layer merging Brainzyme® numbered sessions with Agentic OS patterns — session management, thread recovery, context hygiene, daily digests, and agentic workflow plumbing. Two layers working together: numbered sessions as the backbone, daily memory as the complementary layer.
You have four commands. One launches Claude with everything pre-configured for recovery. One catches you up inside a running Claude session. One saves your state mid-flight before context gets expensive. One wraps up the session with automated banking, daily digest generation, and skill reconciliation.
The cycle when context climbs: /bank → /clear → claude-resume N. That's it. The rest is detail.
One-time install — PATH setup for the shell wrapper
The shell wrapper lives at F:\Agentic-OS\bin\claude-resume.cmd. To call it as just claude-resume from any terminal, that folder needs to be on your user PATH. Run this in PowerShell once:
$bin = "F:\Agentic-OS\bin"
$path = [Environment]::GetEnvironmentVariable("Path", "User")
if (($path -split ";") -notcontains $bin) {
[Environment]::SetEnvironmentVariable("Path", "$path;$bin", "User")
Write-Output "Added: $bin"
} else { Write-Output "Already on PATH." }
After running it, close and reopen any open terminals — PATH changes only affect newly-launched processes. The change is permanent across reboots.
Boots Claude Code (Opus 4.6 + max-effort + 1M-context attempt) with two modes:
/resume <N>. Right after /bank + /clear for a clean low-cost restart.claude-resume <N> --original looks up most recent thread UUID and runs claude --resume <UUID>. You're back IN the old thread, full transcript loaded. Right for crash recovery / quick breaks. Defeats /bank+/clear cost saving — full prior context restored.claude-resume 32
claude-resume 32 --original
Without N, autodetects the current session by looking up which session file contains this thread's UUID. With explicit N, switches to (or refreshes from) a different session. Dispatches a Sonnet subagent to read prior .jsonl transcripts and presents a visible catch-up brief.
/resume
# or, to override:
/resume 32
Without N, autodetects the current session and banks there. With explicit N, banks to a different session (warns if N doesn't match the autodetected current session). Writes a dated ### Bank entry block to ## Bank log; refreshes the top-of-file pointer. Save-only — never auto-clears.
/bank
# or, to override:
/bank 32
Triggers the meta-wrap-up skill — automated end-of-session banking, daily digest generation, and skill reconciliation. Consolidates session findings into a dated daily memory entry and ensures nothing falls through the cracks between sessions.
/wrap-up
When you notice context climbing past ~500K tokens or the conversation feels slow, stop and run this:
/bank
/clear
claude-resume 32
No N needed for /bank — it autodetects the current session. claude-resume still requires N because the new terminal has no context to detect from yet. The new thread starts with a small context window and the freshly banked state as its first reference. Fidelity stays high; weekly usage stops bleeding.
After /bank: you'll see one line — Banked to session N (NAME). Ready to /clear and resume when you're ready. The conversation is unchanged. Keep working in this thread, or run /clear when you actually want a fresh start. The bank entry waits in the session file until you do.
| Situation | Command |
|---|---|
Thread crashed / closed / /clear'd, want to come back | claude-resume <N> from a fresh terminal |
| Inside Claude, want a refresher on what this thread's session was doing | /resume (no arg — autodetects) |
| Inside Claude, switching from one session to another | /resume <N> (explicit override) |
| Context approaching limit, restart soon makes sense | /bank → /clear → claude-resume <N> |
| Context high but you want to KEEP working in this thread | Claude Code native /compact (lossier but no restart) |
| End of a milestone, might come back later | /bank (no arg, no restart needed) |
| About to do extended tool-heavy work | Pre-emptive /bank first |
| Done for the day, want to close out cleanly | /wrap-up — banks + generates daily digest + reconciles skills |
Four tools that look similar but do different things. /bank mid-stream, /wrap-up at session end, /compact when context fills, /clear when you actually want to start fresh.
| Tool | What it does | Touches conversation? | Use when |
|---|---|---|---|
/compact | Anthropic-side context summarisation. Keeps the same thread/UUID alive. | Yes — replaces older turns with a summary | Context approaching limit but you want to keep working in the same thread. No save to disk. |
/clear | Wipes the in-memory conversation, same Claude process / terminal. New thread UUID starts. | Yes — empties it | You're done with the prior topic and want a clean slate without restarting Claude. |
/bank [N] | Writes thread state into context/sessions/session-N-*.md. Conversation continues. | No | Mid-stream save when you might keep going OR resume later. Save-only. |
/wrap-up | Full end-of-session checklist: bank + collect feedback + update learnings + commit. | Bank-style write + git commit | True end of working session. The "everything I should do before signing off" macro. |
claude or via claude-resume N) creates a brand new process AND new thread. Phone definitely sees a fresh boot.Recommended rhythm
/bank → /compact, no /clear. Cleanest pattern for keeping the phone-side mirror coherent. /compact is fuzzy but warm; /clear is cold. The only time /clear earns its keep is when context has gotten genuinely confused (off-topic drift, hallucinated state) and you want a deterministic restart — but in that case /bank then claude-resume N from PowerShell gives you the same clean slate plus a Sonnet brief, so /clear is rarely the best tool.
/wrap-up runs /bank as part of its checklist, then layers feedback collection, learnings updates, and a git commit on top. If you're treating each session-end as a transaction (commit included), use /wrap-up. If you're banking mid-stream knowing you'll keep going (or resume later from the phone), use /bank.
| Command | Where it runs | Args | What it does |
|---|---|---|---|
claude-resume <N> | Shell terminal (cmd.exe / git-bash / PowerShell — wrapper at F:/Agentic-OS/bin/claude-resume.cmd or .sh) | N = session number (required) | Launches Claude Code with default model + effort + betas from settings.local.json; first message is /resume N. N required because cold-boot has no context to autodetect. |
/resume [N] | Inside Claude Code | N = session number (optional) | If N omitted: autodetect via current thread UUID → which session file lists it. If N provided: glob that session. Then read frontmatter + body → dispatch Sonnet → present brief → confirm → resume. |
/bank [N] | Inside Claude Code | N = session number (optional) | If N omitted: autodetect current session. If N provided AND mismatches autodetected: warn before banking. Compose dated entry → append to ## Bank log → refresh top-of-file pointers. |
/wrap-up | Inside Claude Code | — | Triggers meta-wrap-up skill: banks session state, generates a daily digest entry from session activity, reconciles skills used in this session. End-of-day command. |
/clear (native) | Inside Claude Code | — | Clears the conversation history; starts a new thread on the same session ID. Pair with /bank first. |
/compact (native) | Inside Claude Code | — | Summarises the conversation in place. Lossier than bank+restart but doesn't require a new thread. |
| Path | Purpose |
|---|---|
F:/Agentic-OS/bin/claude-resume.cmd | Windows shell wrapper for cold-boot resume |
F:/Agentic-OS/bin/claude-resume.sh | Bash variant for git-bash |
F:/Agentic-OS/.claude/commands/resume.md | The /resume slash command body |
F:/Agentic-OS/.claude/commands/bank.md | The /bank slash command body |
F:/Agentic-OS/.claude/commands/wrap-up.md | The /wrap-up slash command body |
F:/Agentic-OS/.claude/session-thread-register.sh | PostToolUse hook that auto-stamps thread UUIDs into session frontmatter |
F:/Agentic-OS/.claude/settings.local.json | Defaults: model, effort, betas, plus the hook registration |
F:/Agentic-OS/memory/working-context/session-<N>-*.md | The session file. Body sections (## Objective, ## Pending, ## Bank log, ## NEXT-SESSION RESUME) are first-class state. |
F:/Agentic-OS/memory/daily/ | Daily digest entries generated by /wrap-up |
F:/Agentic-OS/SOUL.md | Agent personality and operating principles (from Agentic OS) |
F:/Agentic-OS/USER.md | User preferences and context (from Agentic OS) |
F:/Claude Root/ | Legacy project root (read-only archive) |
Top-level keys added to F:/Agentic-OS/.claude/settings.local.json:
| Key | Value | Effect |
|---|---|---|
model | "claude-opus-4-7" | Default model for all launches in this project |
effort | "max" | Highest extended-thinking tier. Replaces the older ultrathink magic word. |
betas | ["context-1m-2024-08-07"] | Best-effort 1M-context attempt. May be silently ignored if Claude Code doesn't recognise the beta header (see Build Decisions tab for why this is uncertain). |
hooks.PostToolUse | Matcher Write|Edit | Wraps the thread-register script with CLAUDE_TOOL_INPUT="$(cat)" bash … so the script reads stdin via the established convention. |
---
session: 32
name: Targeting & Go-Live
status: ACTIVE # or ENDED
started: 2026-04-12
last_updated: 2026-04-24 22:50 # auto-bumped by /bank and the hook
last_touch: One-line description of the most recent state
originSessionId: 7cc197d0-db4a-4801-af79-b6f454701b09 # first .jsonl UUID
continuationSessionIds: # appended automatically by the hook
- 3e0a1a9d-8727-46ec-b00f-be1c93095b68 # 2026-04-17 resume
- c13e1bca-8226-4b9e-a79a-355de8bb5fec # 2026-04-24 resume
absorbed_sessions: [28, 26] # optional — sessions merged into this one
inherited_from: 29 # optional — predecessor session
---
The model in one paragraph. A Brainzyme "session" (numbered 18, 32, 38, …) is a long-lived workstream that may span dozens of Claude Code threads. Each thread is a single .jsonl transcript file. The session file (session-N-*.md) is the durable record; the thread is the working memory. /bank writes from working memory to durable record. /resume reads the durable record + selectively reads transcripts via a Sonnet subagent. The PostToolUse hook keeps the durable-record-to-transcript map (continuationSessionIds) accurate without manual bookkeeping.
The Hybrid Model. Claude OS v0.3.0 merges Brainzyme® numbered sessions (the proven backbone built since April 2026) with selected patterns from Simon Willison's Agentic OS (SOUL.md/USER.md, heartbeat, wrap-up, progressive disclosure, daily memory). Two layers work together: numbered sessions remain the primary unit of work, and daily digests act as the complementary cross-session memory layer.
Numbered sessions are long-lived workstreams (e.g. "Session 32: Targeting & Go-Live") spanning days or weeks. Daily digests are short summaries generated at end-of-day by /wrap-up, capturing what happened across all active sessions that day.
| Skill | Trigger | What it does | Status |
|---|---|---|---|
os-heartbeat |
Session startup (silent) | Scans pending tasks, checks session file freshness, runs RAG pre-flight, detects stale bank entries. Replaces the manual startup checklist from CLAUDE.md §3. | SHIPPED |
os-session-manager |
/resume, /bank, session create/end |
Full session lifecycle: create numbered sessions, bank state, resume with Sonnet catch-up, end and archive. Wraps the existing slash commands with structured skill logic. | SHIPPED |
meta-wrap-up |
/wrap-up |
Automated end-of-session: banks current session, generates a daily digest entry (cross-session summary for the day), reconciles which skills were used and flags any unresolved items. | SHIPPED |
os-upstream-monitor |
Manual / periodic | Monitors Simon Willison's Agentic OS repo for useful changes. Cherry-picks selectively rather than running update.sh (which would overwrite local CLAUDE.md and skills). Logs cherry-pick decisions. |
PATTERN |
One-command recovery for sessions whose thread has ended, crashed, or been compacted past usefulness. Shell wrapper claude-resume <N> launches Opus 4.7 + max-effort thinking + 1M-context attempt and fires the in-session /resume <N> slash command, which reads the session file's frontmatter and body, dispatches a Sonnet subagent to read the prior .jsonl transcripts, and presents a visible catch-up brief alongside the session file's pending sections before resuming.
Auto-stamps the current Claude Code thread UUID into continuationSessionIds of any working-context session file you edit. Idempotent. Silent. Never blocks the underlying Write/Edit. Eliminates manual bookkeeping of which threads belong to which session — by editing a session file, the thread declares itself.
/bank <N> writes current thread state to the session file's append-only ## Bank log. Pairs with /clear + claude-resume <N> for the canonical "context climbing → compress → restart fresh" cycle. Cheaper and higher-fidelity than letting Claude auto-compact.
Silent startup scan skill. Runs automatically at session start: checks pending tasks registry, verifies session file freshness, triggers RAG pre-flight for ambiguous work, detects stale bank entries that need attention. Replaces the manual startup checklist previously baked into CLAUDE.md §3.
Session lifecycle management skill. Wraps the existing /resume and /bank slash commands with structured skill logic for creating, banking, resuming, and ending numbered sessions. Provides the backbone session management that the hybrid system builds upon.
End-of-session automation skill, extended for the hybrid system. Now handles both numbered session banking AND daily digest generation. When invoked via /wrap-up, it: (1) banks the current session, (2) generates a dated daily digest entry in memory/daily/ summarising all session activity for the day, (3) reconciles which skills were used and flags unresolved items for the next session.
Describes the approach for monitoring Simon Willison's Agentic OS upstream repo and selectively cherry-picking useful changes. NOT a running skill yet — currently a documented pattern. Key principle: cherry-pick, never run update.sh (which would overwrite local CLAUDE.md, skills, and config). Gives full ownership of the hybrid system while still benefiting from upstream innovation.
The local receiver: F:/Agentic-OS/tools/inbox_server.py on 127.0.0.1:7392. Validates the canonical envelope (decisions / legacy_push_live kinds), enforces ALLOWED_DASHBOARDS slug allowlist, applies 30 req/min per-IP rate limit + 5-min idempotency dedup, writes JSON to inbox/<slug>/<ts>.json. Auto-starts at logon via the AgenticOS-InboxServer Scheduled Task (RestartOnFailure 5×, 1 min apart). v2.0.0 hardened by Codex 5.5 2026-05-11. Pythonw stdout-handle crash fixed by the inbox_server_launcher.py wrapper.
Cut the dashboard→FastAPI path from Coolify reverse-proxy + private Tailnet (4 hops, 502'ing under load) to Cloudflare Tunnel direct (2 hops, encrypted). New public hostname inbox.nutritionalproducts.org/jobs. Auth model: cross-origin POST + x-api-token header + 2nd-chance source classification (loopback / Tailnet sources still allowed for backward-compat). Acceptance battery passed 3/4 (the 1 "fail" is /health design choice); real creative-review round-trip 513 KB + ft-tasks Pattern A round-trip both green; Playwright sweep 11/11 clean. 4 Codex passes · 11 HIGHs fixed inline · secrets-never-to-chat-or-git discipline banked in memory.
_shared/feedback-widget.js renders the canonical 4-part feedback UI (3-state preset buttons + auto-saving free-text per decision + submit bar + localStorage queue fallback). _shared/inbox-client.js is the transport layer — pushToInbox() POSTs the spec-compliant decisions envelope cross-origin with x-api-token. Both shared so every new dashboard inherits transport + auth + cache-bust convention (?v=phase7-YYYYMMDD) for free.
The one canonical reference for any new feedback-collecting dashboard. Trigger phrases: "Build me a new dashboard" · "dashboard SOP" · "add feedback". Markdown SOP at reference/it-sops/dashboard-sop.md (renamed 2026-05-17 from dashboard-feedback-mechanism.md for discoverability). Includes the copy-paste starter template (§11), full procedure (§3-7), build checklist (§8), worked examples (qc-v4-sweep / ft-tasks / command-centre-v3 inline-quick-reply variant), and a Phase 7 changelog. The transport + UI primitives are fixed; only the per-dashboard cards: [...] config changes.
Transparent record of every non-obvious choice made during the build. Calum signed off mid-build and gave the build full autonomy. This log is what was decided, what was rejected, and why — so future-Claude (or future-Calum) can read the trail and either trust it or reverse it cheaply.
CHOSE Cloudflare Tunnel direct to FastAPI on localhost:7392. New public hostname inbox.nutritionalproducts.org/jobs; cross-origin POST with x-api-token header (defense-in-depth, real auth = parent basic-auth + CORS allowlist + envelope validator + rate-limit + idempotency); 2nd-chance source classification so loopback + Tailnet sources still work (backward-compat). 2 hops instead of 4.
REJECTED Debug the Coolify 502s in place. 4-hop chain (CF WAF → Traefik → Tailnet → FastAPI) is fragile by design — restart-fixes-temporarily symptom + GET works/POST doesn't was unfixable without architectural change.
REJECTED v1 brief: CF Tunnel + reuse Coolify basic-auth on the browser fetch. Codex pass 1 HIGH — basic-auth header leaks admin:correct-horse… across every dashboard. x-api-token is the correct primitive.
REJECTED v2 brief: install token printed to chat. Codex pass 2 HIGH 2 — chat token persists in .jsonl session log indefinitely. Must be ACL-restricted file with 10-min auto-delete, never to chat.
REJECTED v3 brief: _writes.json carries install token. Codex pass 3 HIGH 1 — _writes.json goes to git. Token NEVER persists there. IDs only by design.
Why Phase 7 wins: the 502s are gone (proven live), the auth model is layered (5 controls behind x-api-token), the install-token handoff is forensically defensible (4 Codex passes worth of paranoia banked into the executor), and rollback is ~60s (revert + Coolify route still wired for 7 days). Acceptance: 4-test battery green · real creative-review round-trip 513 KB envelope · ft-tasks Pattern A round-trip · Playwright sweep 11/11 clean · canonical-sync BackgroundTask still fires for the creative-review review.json git push. SOP at apps.nutritionalproducts.org/it-sop/inbox-cloudflare-tunnel/. Memory: feedback_inbox_cloudflare_tunnel.md.
dashboard-feedback-mechanism.md → dashboard-sop.mdCHOSE Rename the canonical SOP file so the filename matches Calum's natural trigger phrases ("Build me a new dashboard" / "dashboard SOP"). H1 updated to "Dashboard SOP — Build a New Dashboard". 8 current cross-references updated; historical refs in deployment-log + session files + bank logs left intentionally as-is (preserves timeline).
REJECTED Keep filename, add alias only. Calum said "you should call it [X]". A filename change makes the trigger phrase grep-able + tab-completable + URL-able, not just searchable in body text.
Why: the SOP exists to be FOUND. Discoverability beats stability when the file is internal-only. Cross-reference updates are mechanical (~10 places), historical references stay accurate as historical records. Now any future session typing "dashboard SOP" lands on the right file via either filename grep or trigger-phrase header match.
CHOSE Strategic option: extend tools/sync_registries_to_master_sheet.py with a 3rd derived-mirror tab for reference/tool-map.md; add cross-links to all 3 Sheet tabs in the access-registry dashboard; resolve Stage 9 preflight FAIL via the existing (parallel-session-built) check-tool-map.py validator.
REJECTED Tactical — refresh existing tabs only. Doesn't surface tool-map; future drift inevitable.
REJECTED Tactical+ — one-off populate Tool Map tab manually. Creates drift risk; no sync script means manual maintenance every time the markdown changes.
Why Strategic wins: the cost of the 3rd-tab sync parser (~30 min) is amortized over every future tool-map edit. The two existing registry tabs were already on the same one-script pattern; staying consistent beats one-off scripts. Drift validator (check-tool-map.py) catches: MCP servers in .mcp.json not in tool-map / slash commands missing / skill-count divergence / unregistered hooks / specialist tool gaps. Now Stage 9 of preflight reports clean (3 informational WARNs only). Full executive brief: apps.nutritionalproducts.org/tool-map-sync/ — covers decision tree, what shipped, conclusions, and the still-in-flight ghost-dir source-ID workstream.
Mid-task discovery: check-tool-map.py was reported missing by Stage 9 but had already been built by the parallel session-49 thread yesterday (323 lines, working). Stage 9 FAIL was from a stale snapshot. Verified rather than rebuilt — saved ~30 min. Reinforces the "grep before claiming absence" memory rule.
CHOSE Option B: Full Hybrid. Numbered sessions as backbone + daily digest as complementary layer. Sessions remain the primary unit of work; daily digests provide the cross-session memory that Agentic OS's daily memory pattern offers. SOUL.md and USER.md adopted for progressive disclosure.
REJECTED Option A: Bolt-on. Too minimal — just adding SOUL.md/USER.md without rethinking the session model would have been cosmetic. No daily memory, no heartbeat, no wrap-up. Would have left the proven Brainzyme system unchanged but missed the architectural improvements.
REJECTED Option C: SQLite-backed sessions. Over-engineering. The markdown session files work. Adding a database layer would introduce complexity (schema migrations, query tooling, backup strategy) for marginal benefit. The hybrid model gets the architectural wins without the infrastructure burden.
Why B wins: Codex 5.5, Gemini 2.5 Pro, and market research all converged on Option B independently. The daily digest pattern from Agentic OS solves the real gap (cross-session memory decay) without replacing the proven numbered session system that's been working since April 2026.
CHOSE Monitor Simon's upstream repo and cherry-pick useful changes selectively. Each cherry-pick is logged with rationale.
REJECTED Running update.sh from Simon's repo. It overwrites CLAUDE.md, skills, and config — would destroy every Brainzyme customisation on every update.
Why: Eliminates overwrite risk entirely. Gives full ownership of the hybrid system. The upstream repo is a source of ideas and patterns, not a dependency to track. If Simon ships a breakthrough feature, we cherry-pick the concept and implement it within the Brainzyme session model.
CHOSE F:/Agentic-OS/ as the sole active project root. All tools, config, memory, skills, and bin scripts live here.
REJECTED Running two folders (F:/Claude Root/ + F:/Agentic-OS/) in parallel. Dual roots create drift, confusion about which CLAUDE.md governs, and duplicate tool paths.
Why: F:/Claude Root/ becomes a read-only legacy archive. All new work happens in F:/Agentic-OS/. Clean break, no ambiguity about which folder is authoritative. Migration is one-time; references to F:/Claude Root/ in memory files are updated incrementally as files are touched.
CHOSE Keep the Brainzyme static dashboard (apps.nutritionalproducts.org) as the primary Command Centre. Simon's Next.js CC available as a secondary reference implementation.
REJECTED Replacing the Brainzyme dashboard with Simon's Next.js Command Centre. Windows compatibility risk (Node.js PATH issues, native deps), no session management UI, would lose the existing Kanban sync + creative library + audit tabs.
DEFERRED KV relay for dashboard decisions → local execution (Cloudflare KV as the bridge between dashboard UI clicks and local Claude Code execution).
Why: The static dashboard is proven, zero-dependency (just HTML/JS on GitHub Pages), and already has all the Brainzyme-specific tabs (creative library, Google Ads audit, on-page SEO, Kanban). Simon's CC is a React/Next.js app that would require hosting, env setup, and adaptation. The dashboard works. Don't fix what isn't broken.
Diagnosed: session-start cost was 185–255K tokens (MCP tool schemas dominate). Removed from .mcp.json: gdrive (legacy, superseded by tools/bns_*.py), pinecone-b (rarely-queried secondary), stitch (Google web design MCP, not in active use). Calum manually disconnected 5 Claude.ai connectors (GitHub / ClickUp / Gmail / Calendar / Drive) at claude.ai → Settings → Connectors. Chrome extension "Claude in Chrome" disabled — was the single biggest MCP overhead (~30–45K tokens for 18 browser tools).
Why this beats DeepSeek v4 swap: the per-turn cost driver is context size, not which model is processing it. Cutting 80–125K tokens of permanent session overhead applies on every fresh thread (cold-start cost during bank+restart cycle). Banked decision record: memory/feedback_deepseek_v4_evaluation_2026_05_02.md documents why DeepSeek doesn't apply (subscription billing, not pay-per-token; data sovereignty). Lessons file: claude-setup/it-handbook/lessons-2026-05-02-mcp-context-pruning.md.
Realistic baseline post-prune: ~100–145K tokens at session start (down from 185–255K). 50K target unrealistic with active dataforseo + google-ads + occasional pinecone (those three alone are 60–100K). Activates next Antigravity restart.
/bank-full — multi-destination bank with mandatory Codex 5.5 second-opinionDefault /bank stays narrow (session file only) for high-frequency use. New /bank-full handles cross-cutting banks: classifies content (canonical-update / sop-revision / config-change / feature-shipped / intel-finding / etc.), proposes routing across a 14-row destination matrix (session file + Pinecone + Graphiti + new+existing feedback files + MEMORY.md + pending_tasks_registry + main+scoped CLAUDE.md + SOP files + backend registry + workstream changelog + hooks), runs HARDCODED Codex 5.5 review of the routing plan via codex_qc.py --mode decision, presents combined Claude+Codex plan as a checkbox checklist, and executes only what user confirms.
Why hardcoded Codex review (not optional): the value is the second-opinion BEFORE writing to multiple destinations. If Codex review is opt-in, it gets skipped under load. If hardcoded with a graceful "WARN if Codex errored" path, the second pair of eyes is always there OR clearly visible-as-skipped.
Neo4j Desktop (local, free) + graphiti-core v0.29.0 + OpenAI for entity extraction. Group ID: brainzyme-wiki. Initial seed of 27 episodes (all memory/CANONICAL_*.md + memory/feedback_*_canonical.md + brands.json + matrix _meta + 4 key SOPs) yielded 178 entities + 415 RELATES_TO edges.
Three Python wrappers in tools/: graphiti_setup.py (initial seed), graphiti_sync.py (ongoing, mtime-tracked), graphiti_query.py (one-shot or REPL hybrid search).
21-test SOP-correctness battery results: 18/20 pass at recall@30, 15/20 at recall@15. Honest: Graphiti is strong on canonical structural facts (lexicon, GA4 property IDs, brand portfolio relationships) but weaker on procedural recipes and slash-command syntax. Use it for "what's the current canonical fact?" questions — not as a replacement for markdown SOPs or Pinecone semantic search. Re-evaluate value at 2-3 weeks based on actual usage.
Claude leads, Codex 5.5 challenges, user decides. Codex review fires auto on every Write/Edit (PostToolUse hook, deterministic) AND behaviorally on plans / agent prompts / multi-system decisions (CLAUDE.md §10 rule). Gemini handles video / PDF / audio — separately, on-demand, NOT a QC peer.
Codex stack: tools/codex_qc.py (4 modes: code/plan/agent-prompt/decision; stdin prompt to dodge Windows 32K argv limit; quota+auth error classifier; PATH-injection for codex.cmd). Hook at .claude/qc-codex-review.sh. Toggle via /qc-on (default) and /qc-off. Codex's standing context lives in F:\Claude Root\AGENTS.md (slim ~130-line QC brief, NOT a CLAUDE.md duplicate).
Gemini stack: tools/gemini_quick.py (CLI / Flash via OAuth, free tier; for transcripts / single PDF / image QA) + tools/gemini_pro.py (API / 2.5 Pro via Google AI Studio key; supports --youtube URLs and auto-routes to Files API for files >20MB; for frame-by-frame video / large PDF deep reads).
Why Codex over a second Claude: adversarial perspective comes from a different model. The QC framework is genuinely catching bugs — it found 9+ HIGH issues during this very build round (PATH inheritance, backslash-path normalization, hardcoded paths, idempotency bugs, file I/O error gaps, etc.) that an Opus-on-Opus review would more likely miss.
--original heavyweight mode (default stays lightweight)Calum noticed claude-resume <N> creates a new thread and reads the old one's transcript via Sonnet — wondered if continuing the old thread directly would be better. Answer: both are valid, just for different cases. Now claude-resume <N> --original does the heavyweight resume (looks up most recent UUID via bin/_session_uuid.py and runs claude --resume <UUID>) so you're back IN the old thread with the full transcript loaded.
Why default stays lightweight: the /bank + /clear + claude-resume playbook exists precisely to dump bloated context and start fresh. If we made heavyweight the default, every post-bank resume would re-inherit the 500K context we just dumped — defeats the entire cost-saving purpose. Lightweight is the right default for the playbook; heavyweight is the right opt-in for crash recovery / quick breaks where you want zero state loss.
The claude-resume wrapper now passes --model claude-opus-4-6, overriding settings.local.json's claude-opus-4-7 default at the wrapper level only. Manual claude launches (not via wrapper) still get Opus 4.7.
Why: the resume cycle is mechanical (read bank log → dispatch sub-agent → present brief → wait for confirm) and doesn't need 4.7's deeper reasoning. Opus 4.6 is the only Opus tier that supports /fast mode for faster output, which is useful during mechanical work. Opus quality is preserved (no Sonnet downgrade); just a one-tier shift.
/resume <N> end-to-endCalum invoked /resume 49 from a live thread. Run completed every step: session file located, frontmatter parsed, current thread UUID resolved (via mtime), UUID appended to continuationSessionIds via Edit, Sonnet subagent dispatched + structured catch-up brief returned, dual-view (brief + session file body) presented, confirm gate held until Calum said "yes". This is the first end-to-end production proof that the slash-command path works.
Hook anomaly logged: the PostToolUse thread-register hook did NOT fire on the session-file Edit — F:/Claude Root/.claude/session-thread-register.log does not exist, and the hook script writes a log line on every invocation. Hook IS registered in settings.local.json:288–298. Cause not yet diagnosed; possible candidates: Antigravity not loading settings.local.json on startup; cat of $CLAUDE_TOOL_INPUT reading nothing; permissions/exec issue. Workaround: always pass explicit N to /bank and /resume until hook is fixed (autodetect/empty-N path depends on the hook having stamped this thread's UUID at least once).
Still untested in production: claude-resume <N> cold-boot wrapper (this thread's launch path unknown), /bank end-to-end, 1M context beta.
/bank and /resume via thread-UUID autodetectv0.1 required /bank N and /resume N. Calum pointed out the friction: in 95% of cases you're already working on session X and want to bank/resume to session X — typing N is friction.
Both slash commands now branch: if N is provided, use it (with mismatch warning if it doesn't match autodetected current session). If N is omitted, resolve current thread UUID via mtime, grep session files for that UUID, and use the match. Multiple matches → ask. Zero matches → tell user to register the thread first or pass N explicitly.
Why: the thread-register hook already maintains a perfect lookup table (every session file's continuationSessionIds contains every thread UUID that's ever touched it). One grep gives us the authoritative answer. Cold-boot claude-resume N still requires N — there's no thread context yet. The override path stays useful for the rare case of banking to a different session.
--effort max instead of "ultrathink" magic wordClaude Code 2026 exposes --effort <low|medium|high|xhigh|max> as a first-class flag. "ultrathink" was the older prompt-body magic-word convention.
Why: Configuration > prompt content. Setting "effort": "max" in settings.local.json makes every launch high-effort by default without polluting prompts. The wrapper passes --effort max redundantly as belt-and-braces.
Probed $CLAUDE_SESSION_ID in this Claude Code version: not exposed to the main process (only inside hooks via JSON stdin). Confirmed via env | grep -i claude.
Why: Spec listed env-var as priority 1 with mtime fallback. Reality required swapping the order. ls -t *.jsonl | head -1 reliably surfaces the current thread because the active .jsonl is constantly being appended to. The hook still gets session_id via JSON stdin, so hook-side registration is unaffected.
Set "betas": ["context-1m-2024-08-07"] as best-effort. Did not have time to launch a fresh Claude session and verify the beta is actually being applied on the wire.
Why deferred: If Claude Code doesn't recognise this betas key in settings, it'll be silently ignored — no breakage. The wrapper still works at standard 200K context which is sufficient for most resume workflows. Calum can verify with claude -p "what's your context window?" on next launch and we adjust then. Marked in CLAUDE.md as pending verification.
.claude/ not in .claude/hooks/Spec proposed .claude/hooks/session-thread-register.sh. The existing 7 production hooks (banned-copy-gate.sh, shopify-drift-check.sh, etc.) all live flat in .claude/.
Why: Match existing convention. Splitting hooks into a sub-folder for one new file would create inconsistency without benefit. If the count grows past 12-ish, revisit.
CLAUDE_TOOL_INPUT="$(cat)" bash … wrapper, not stdinExisting project hooks (Shopify drift check, banned-copy gate) wrap the bash command with CLAUDE_TOOL_INPUT="$(cat)" bash … so the script reads input from an env var instead of stdin.
Why: Convention consistency. Also makes the script independently testable (you can export CLAUDE_TOOL_INPUT='{"…"}' and run the script directly to debug). The new script reads ${CLAUDE_TOOL_INPUT:-$(cat)} so it works either way.
Initial spec said "extract from frontmatter." Calum caught a gap mid-conversation: the session file body has the actual Pending / Next-action / NEXT-SESSION RESUME sections that he maintains by hand.
Why: Frontmatter alone is the audit trail; body is where state actually lives. Spec + plan patched in real-time before the build started. The /resume Step 7 now displays both Sonnet's transcript-derived brief AND the session file's body sections — divergences between the two are useful signal.
Bank entries accrete. Old entries are never modified. The most recent entry plus a top-of-file pointer carries the "current state" load.
Why: History is searchable and cheap. Overwriting loses the trail of how we got here, which is exactly the signal you want when a decision turns out to be wrong. The top-of-file pointer (## NEXT-SESSION RESUME + frontmatter last_touch) is what gets refreshed each time, so readers see fresh state without scrolling.
If the thread has done no work (zero decisions, zero files touched, zero open loops), /bank refuses politely and exits without writing.
Why: Empty banks pollute the log with no-signal noise and dilute the value of meaningful banks. Better to fail loudly than write a vacuous entry. If you really want to bank "I did nothing", you can manually add a note.
Considered: a hook or background process that detects when context crosses 60% / 80% and auto-fires /bank.
Why rejected: Surprising side effects. The user should be in control of when state gets written. Also, Claude Code doesn't reliably expose context-usage to hooks in this version — implementation would require polling or hacks. Manual /bank with a status-line warning (deferred to v1.1) is a cleaner path.
Considered: /bank-and-clear <N> as a one-shot.
Why rejected: /bank is reusable at any time — at end of milestones, before tool-heavy work, just to checkpoint. Coupling it to clearing locks it to one use case. The two-step flow (/bank then /clear) keeps the building blocks atomic. If a one-shot becomes a strong pattern, add it as a v1.1 alias.
--resume <uuid> for Brainzyme session resumeClaude Code has built-in claude --resume <uuid> that literally continues a prior thread (loads its full history into a new process).
Why rejected for our use case: Native resume reloads the FULL prior context — exactly what we're trying to escape when the cycle is "context too high." Our flow deliberately starts a new thread with a small context, using Sonnet to summarise the old one. Native --resume is the right tool for "Claude crashed mid-task and I want to literally continue" — different problem.
Added a dedicated ## Claude OS — Read for any session-management question section near the top of MEMORY.md, just after the User section and before Feedback & Behaviour.
Why: High-frequency reference. Future Claude sessions will hit session-management questions on every cold boot — putting it in Feedback & Behaviour buries it. Top-positioning makes the index do its job: surface the right doc fast.
The bin folder isn't on your user PATH. Run the one-time PowerShell setup from the Quick Start tab (top of page), then close and reopen the terminal. The shell wrapper file at F:\Agentic-OS\bin\claude-resume.cmd exists fine — it just can't be found by name without PATH.
The slash command file may not be where Claude Code looks. Check that F:/Agentic-OS/.claude/commands/resume.md and F:/Agentic-OS/.claude/commands/bank.md exist and that you launched Claude from the project root (F:/Agentic-OS/).
If the commands directory was created mid-session, restart Claude Code — slash commands are scanned at startup.
Likely the primary .jsonl doesn't have what we expected at the tail. The /resume spec says: widen budget once and retry, then report honestly. If that still gives nothing, fall back to reading the session file body directly — your manual notes are authoritative.
Worth checking: ls -la "C:/Users/PC/.claude/projects/F--Claude-Root/<UUID>.jsonl" — if size is suspiciously small, the thread may have been truncated.
Check F:/Agentic-OS/.claude/session-thread-register.log for entries when you edit a session file. If it's silent:
1. Confirm the hook is registered in .claude/settings.local.json under hooks.PostToolUse with matcher Write|Edit.
2. Confirm the script is executable: ls -la "F:/Agentic-OS/.claude/session-thread-register.sh" should show an x permission.
3. Try a no-op edit (add then remove a space) to a session file — the hook should fire and log.
The wrapper calls claude --effort max "/resume N" — passing the prompt as a positional argument. If your installed Claude Code version handles positional prompts differently, the wrapper may need adjusting.
Quick fix: launch Claude manually then type /resume N. If that works, the wrapper's prompt-passing is the issue. Update F:/Agentic-OS/bin/claude-resume.cmd — try piping via stdin instead of positional.
The betas setting is best-effort. If the beta header isn't being applied, you're at standard 200K. To verify:
From a fresh Claude session: "what's your context window?" — Claude usually answers honestly.
If 1M isn't active and you need it, ask Anthropic support for the current beta-header name in your account, then update settings.local.json betas array. Or accept 200K and bank more frequently.
Most likely cause: you banked, then continued in the same thread, then ran /resume on the same session number — the resume picked up the conversation's transcript which already contained your live state, so the bank entry looks redundant.
The bank value shows up when you actually /clear (or close + reopen Claude) before resuming. The fresh thread starts blank and the bank entry becomes the primary catch-up source.
Any .jsonl transcript, any file >500 lines, any bulk search across >5 files — dispatched to a Sonnet subagent, never read into the main Opus context. Documented in CLAUDE.md §3 "Delegation-First Triage" and feedback_delegation_triage.md.
Working-context files under memory/working-context/ are updated at every task milestone — objective, pending items, residual context. After thread death, the file is the primary source of truth; transcripts are secondary. Documented in CLAUDE.md §3 "Working Context Lifecycle."
Before any substantive answer on campaigns, creative, compliance, or any topic matching the trigger list, query the Pinecone claude01-v2 index first. Full SOP in feedback_rag_preflight_mandatory.md.
Product vocabulary lives in brainzyme_product_matrix.json. Creative rules live in creative_vault.json. Shopify publish rules live in the SOP. Claude OS components live here. Memory files point to the canonical source — they don't duplicate content.
Cloudflare KV as the bridge between dashboard UI clicks and local Claude Code execution. Dashboard writes a decision/action to KV; local polling agent picks it up and executes. Enables "click approve on the dashboard, Claude acts locally" without exposing local ports. Deferred from v0.3.0 Command Centre decision.
A status-line hook that flags when conversation context crosses configurable thresholds (e.g. 60% / 80%), prompting manual bank + restart before Claude's auto-compaction kicks in and loses fidelity. Depends on whether statusline.sh hooks receive token-count data in this Claude Code version (currently unverified).
Lightweight tool that reports the current Claude thread UUID without relying on env-var availability or mtime heuristics. Useful for hook portability and for any other tooling that needs to correlate thread state with external systems (Pinecone, ClickUp, dashboards).
/bank-diff <N> shows what's changed since the last bank entry — useful for "what did I actually accomplish this thread" checks before a major commit.
Ship bank entries to claude01-v2 for semantic recall across sessions. Pairs with the existing Pinecone banking pattern.
What this is. A meta-analysis of the primary Brainzyme ad-ops Claude instance, done by a second observer instance with no business overlap. 23 subagents read all-time sessions + legacy Claude Root memory + raw .jsonl threads. Findings cross-reviewed by Codex 5.5 + Gemini 2.5 Pro.
Headline. The migration to Agentic OS + GSD genuinely fixed most legacy bugs (hook-bypass count went from 12 in 4 days → 0 in 3 days). The 15-specialist-session architecture is correct, not sprawl. Three vestigial issues remain: cross-domain drift, pending-task accretion, and invisible async fan-out density. Recommendation: add a supervisor session, not more rules.
Date. 2026-05-14. Reviewer model: Opus 4.7 (observer instance) + Codex 5.5 + Gemini 2.5 Pro.
| Wave | Scope | Agents | What it read |
|---|---|---|---|
| 1 — Recent sessions | 2026-04-23 → 2026-05-14 | 10 | 16 active session files, pending-tasks-registry (63KB), the four largest (32/38/49/25) |
| 2 — Peer review convene | Synthesis → framework | — | Codex 5.5 (decision mode) + Gemini 2.5 Pro (independent) |
| 3 — Historical sessions | Pre-2026-04-23 + archived | 7 | 29 older sessions (S14-S55 not in wave 1) + 2 archived |
| 4 — Legacy memory | F:/Claude Root/memory/ (264 files) | 4 | MEMORY.md + CANONICAL spines, oldest feedback (Apr 4-15), pre-migration feedback (May), project/reference files |
| 5 — Raw .jsonl threads | Verbatim conversations | 2 | Biggest current thread (26MB, 9,407 lines) + oldest surviving (Mar 14) |
Calum runs 15 terminal windows in parallel, each session owning one specialist domain. This is not sprawl — it is intentional architecture. Table below is live — fetches sessions.json (the canonical Kanban source).
sessions.json with per-session pending_items[] (which would have meant editing the auto-sync hook + risking cross-session breakage), we shipped three new JSONs next to it: pending-triage-report.json, calum-decision-packet.json, fanout-summary.json. All three are read by the new Triage tab. The per-session pending dropdown below still shows COUNT only; the item-level detail lives on the Triage tab. This is the simpler, lower-risk answer.
Diagnosis recalibration: the issue is not that sessions exist or that they live long. The issue is that some sessions are drifting across their stated specialist boundaries, and some are paused at gates without an escalation reflex. Domain mapping below is Calum's mental model overlay; live status is from sessions.json.
| Pre-migration pain (Claude Root, Apr-May) | Agentic OS fix — working |
|---|---|
12 HOOK_BYPASS invocations in 4 days (May 1-4) | 0 bypasses in recent 3-day raw thread (e3c57133, 9,407 lines) |
| 34 files with “canonical” in name; precedence wars | SoT precedence + _consolidates frontmatter |
reference_files.md 58KB hoarding flat list; Graphiti refused to ingest | connections-registry.md + tool-map.md with drift validators |
feedback_ga4_canonical.md revised 3x in 5 days | CANONICAL_strategy.md with consolidation discipline |
| 9 tools in March → 67 deferred-tool sprawl | Tool-map reconciliation rule + Step 0 RAG pre-flight |
| # | Pattern | Severity now | Evidence |
|---|---|---|---|
| D | Pending-task graveyard | ACCELERATING | Legacy 302 lines → current 448 lines (today). Same disease, higher accretion. |
| A | Cross-domain drift (not sprawl) | RESIDUAL | 4 of 15 specialist sessions absorbing out-of-domain work (32, 25, 38, 49) |
| X | Invisible async fan-out | NEW DIAGNOSIS | 80 TaskCreate dispatches in 75h via raw thread — invisible at session-file layer |
| B | Recursive self-modification | SLOWING | S49 shipped 7 OS versions in one session; feedback_canonical_aspect_banking.md written into legacy 6d post-migration |
| I | Hook-bypass normalization | MOSTLY FIXED | Recent 3-day raw thread: 0 bypasses. Deterministic gates working. |
| C | Idle-time aversion | RESIDUAL | S38B dispatched 3 agents during Workspace block; bank entries' “Next action” never reads “wait” |
| E | Docs-after-trying | RESIDUAL | S50: 8+ OAuth attempts before reading service contract; Connections Registry exists, reflex patchy |
| H | Strategy never deploys | CALUM-GATED | S20 honeypot 5wk pause; S46 SEO 3wk at deploy gate. Calum-blocker, not Claude-blocker. |
| F | Stop-loss asymmetry | RESIDUAL | $ cost tracked cleanly (CPC ceiling reverted at +121% CPA). Hour cost not tracked. |
| G | Volume-as-value hoarding | LEGACY ERA | S25: 9,850 assets indexed, only 27 actioned. Less central in current work. |
| J | Specialist humility | COUNTER-PATTERN (KEEP) | S36 (legal), S47 (finance), S37 (IT) — calibrated hedging, no faked expertise |
| K | Clean closure when externally bounded | COUNTER-PATTERN (KEEP) | S52: 32 files, £7.30 under £8 cap, ENDED. Hardware-limit accepted. |
Symptom: pending-tasks-registry.md at 448 lines and growing. Add rate > close rate. 22 Calum-gated tasks. The 15-min UI task blocked 3 weeks with no escalation.
Proposal: retire the markdown registry; promote GSD to sole owner.
READY/NOT STARTED/TO BUILD/DEFERRED auto-escalates: (a) close, (b) escalate to Calum with kill/keep ask, or (c) dated trigger. Weekly cron via tools/triage_pending_tasks.py.Why this works: the registry is currently doing two jobs (queue + audit log). GSD does the queue better; banks + memory do audit better. The markdown is duplication.
Symptom: 4 of 15 specialist sessions are absorbing out-of-domain work. S32 (agent build) absorbed registry + canonical messaging. S25 (MarCom) absorbed cascade + dashboards + master-sheet sync. S38 (Shopify) ran 7 article variants in parallel.
Proposal: add a 16th “supervisor” session that routes incoming work to the correct specialist.
domain: + refuses_outside_domain: true to frontmatter. When a worker is asked to do off-domain work, it MUST bounce back to the supervisor.F:/Agentic-OS/.claude/multi-session-status.jsonl — every worker appends a status line on /bank with files-touched. Supervisor watches for collisions.absorbed_sessions: [N, M] frontmatter requires same-day matching absorbed_into. No silent absorption.Why this works: the 15-specialist model is the right architecture. The missing piece is a coordinator. Currently the “coordinator” is Calum running 15 windows; the supervisor session relieves that bottleneck.
Symptom: raw .jsonl thread e3c57133 showed 80 TaskCreate dispatches in 75 hours (~1 every 55 min). Session files would say “dispatched a few agents.” The orchestration density is hidden at the layer humans read.
Proposal: instrument the fan-out + impose a daily dispatch budget per session.
TaskCreate writes a line to .claude/fanout-log.jsonl with {session, ts, task_subject, parent_session}./bank + start a fresh thread.codex_qc.py --mode agent-prompt review first.Why this works: “you can't manage what you don't measure.” The dispatch volume is real production work, but invisible. Surfacing it lets Calum see when a session is fan-out-bombing.
Gemini 2.5 Pro’s “Time-Out-On-Block (TOOB)” protocol. When a task is blocked on Calum >24h, the only valid next action is STATUS: BLOCKED_ESCALATED — no adjacent exploration.
Currently S38B and S47 are routing around Calum-blocks by generating execution theatre. The instance must become comfortable with waiting. The supervisor session enforces this.
PostToolUse hook (.claude/hooks/fanout-counter.js) on TaskCreate live. tools/fanout_summary.py aggregates into the dashboard JSON. Read-only. Baseline forms over the next 7 days. View live density →
Three read-only scripts on the existing markdown registry: triage_pending_tasks.py, calum_decision_packet.py, check_pending_task.py. New Triage tab in the pipeline kanban renders all three. No migration, no in-session disruption. View Triage tab →
Add domain: + refuses_outside_domain: frontmatter to all 15 active sessions. Ship the shared multi-session-status.jsonl with /bank hook integration. Still no supervisor — just visibility.
Spin up Session 60 (or rename one MISK) as the supervisor. Specialist domain: routing. It reads multi-session-status.jsonl + GSD + frontmatters, surfaces drift, suggests handoffs. Calum still does final routing — supervisor just recommends.
Re-open the Triage tab and check the +14d success criteria: stale count low, Calum-blocked dropped, missing-acceptance trending down. If not trending well, that's the trigger to revisit Fix 1 with a more aggressive option (A: build cross-session task store, or B: external system like Linear).
Add TOOB enforcement: 24h block timer. Add the 20-dispatch-per-session daily budget. Both have soft warnings first, hard stops only after 2 weeks of fanout-summary baseline.
Most of his published work is conceptually clarifying but not directly liftable. The one genuinely portable artefact: karpathy/autoresearch (March 2026) — frozen substrate + one mutable file + markdown prompt + scalar score. Loop: edit → 5-min run → score → keep/discard. 700 experiments / 2 days / 20 wins. This pattern maps onto your .claude/skills/ self-improvement directly — your meta-skill-creator already has the eval harness; what's missing is the overnight hill-climbing loop running it.
He also runs “tmux grids of agents with watcher scripts” (Fortune, March 2026) — that's literally your 15-session setup, but he's brute-forcing it without architecture. Your Command Centre + session-thread-register is ahead of where Karpathy is publicly. The LLM Wiki gist + Software 3.0 framing are useful vocabulary, not installable code.
| # | Technique | Why this one | Reference |
|---|---|---|---|
| 1 | Eval-Driven Development (Braintrust pattern) | You have 15 specialist sessions but no eval harness on their outputs. Add Braintrust-style eval set per session — compliance pass-rate, copy-quality vs canonical, schema validity. Every CLAUDE.md change becomes measured, not guessed. Highest ROI move. | braintrust.dev |
| 2 | Sleep-Time Compute (Letta) | You already have inbox/cron. Add nightly job per session that re-reads day's transcripts, generates Reflexion-style reflections, pre-computes likely RAG queries. +18% accuracy, 2.5× cost reduction reported. | arxiv 2504.13171 |
| 3 | Multi-Agent Peer Review with Reliability Weighting (A-HMAD) | You already invoke 3 LLMs. Upgrade by tracking each reviewer's historical agreement-with-Calum per task class (compliance / code / copy / strategy) and weighting votes accordingly. +4-6% accuracy lift reported. | ACL 2025 |
| HM | Voyager-style auto-skill library | Your .claude/skills/ is one cron away from auto-extending when a pattern repeats 3+ times. Honourable mention. | arxiv 2305.16291 |
Your Opus + Codex 5.5 + Gemini 2.5 Pro pattern is the canonical “LLM Jury” (Cohere PoLL) + “Mixture of Agents” (Together AI, ICLR 2025 Spotlight) structure. 3 independent model families (Anthropic / OpenAI / Google), adversarial role assignment, verbatim surfacing — you nailed the structure. References: MoA, PoLL.
What you're doing better than research:
feedback_fix_codex_highs_dont_halt discipline — research finds many panels halt and lose velocityThree gaps research says to close:
tools/qc_calibration/ with known-correct judgements. Run quarterly through the panel. Without it, panel agreement = unmeasured confidence. With it, weight each reviewer per domain.Each card has 3-state buttons (click once = selected, twice = confirmed, thrice = clear) and an auto-saving feedback textarea. Use the Send Feedback bar at the bottom of the page to push your decisions to the Inbox — matches the canonical pattern used across all dashboards (_shared/feedback-widget.js).
| Path | What |
|---|---|
F:\Agentic-OS\tmp\dream-review-synthesis-2026-05-14.md | Synthesis sent to Codex + Gemini |
F:\Agentic-OS\context\sessions\session-21-cleanup-and-planning.md | Largest historical session — identified as “the genome” for 9 of 11 patterns |
F:\Claude Root\memory\feedback_behaviour.md | Oldest feedback file (Mar 15) — Calum’s 7 founding complaints |
F:\Claude Root\sessions\67218b35-*.jsonl | Oldest raw thread (Mar 14) — original sin: ended with Claude symlinking its own filesystem |