v4 → n8n Migration

SESS 49D · PLAN v3 CANONICAL

Phased migration of the Brainzyme v4 ads agent from Claude-orchestrated agentic execution into n8n deterministic workflows. Hermes (Qwen-powered) drafts; three-reviewer QC (Codex + Gemini + Qwen) gates; Claude applies. This dashboard captures decisions, open questions, and feedback. Click any section to expand.

Status: ✓ in flight — GSD Spine T-ffb3b2 Plan: PLAN.md v3 (canonical) QC: 3-reviewer (Codex + Gemini + Qwen) Provider: Featherless (until 2026-05-25) → OpenRouter Last updated: 2026-05-19

1 · Where we are right now 7 decisions

Plan went through two QC rounds. v2 of PLAN.md was reviewed Codex+Gemini and refined. Then your six points of feedback (Telegram, DeepSeek, morning brief, Hermes meta-orchestration, file SoT, Railway/Cloudflare) produced CHANGES-v3.md, which also went through Codex+Gemini — both said REFINE. The reviewers caught real architecture mismatches I missed; the refined v4 sits below in Section 3.

Decision state (compact)

Decision	State	Note
Migration strategy: Strangler Fig, bottom-up	CONFIRMED	Anthropic / Microsoft Conductor / Vellum consensus pattern. Industry-validated.
HITL channel: Telegram (was Google Chat)	CONFIRMED 2026-05-19	2-way comms; needs webhook auth fix (see §3 #1).
Workflows-as-code path: `F:/Agentic-OS/n8n/workflows/`	CONFIRMED	Nightly CI export via `tools/n8n_workflow_sync.py`.
Session tracking home: 49d (no new 49e)	CONFIRMED 2026-05-19	49e is now GSD Spine M2 rollout (different workstream).
Runner contract	REVERTED to A	v3 had moved to Option E (Joe SEO Coolify). Gemini HIGH: Windows-native tools break on Linux. Revert to Option A (Calum local + Tailscale) for Phase 0-3.
PAUSED HITL carveout	DEFAULT ON	Revisit after 4 weeks of stable ENABLE HITL.
GSD Spine M2 integration (NEW)	NEEDS YOUR CALL	Use the Spine as the canonical task surface for this migration? See §2.

QC verdict trail

v2 PLAN.md — Codex REFINE / Gemini REFINE — 6 HIGHs (runner substrate, PAUSED HITL, sheet-write wrap, shadow-run mechanism, credentials SSOT, verification CLI). All folded into v2.
CHANGES-v3.md — Codex REFINE (4 HIGH) / Gemini REFINE (3 HIGH). Converged on: Windows/Linux mismatch on Option E; compliance-gate substitution = legal risk; Hermes adoption contradicts my own VIDEO-RESEARCH.md; Telegram webhook + Bearer auth incompatible; data.json is cache not SoT; Hermes git-write contradicts safety boundary.
v4 refined — narrows Phase 0 to ~1.5 weeks, quarantines Hermes + DeepSeek as parallel R&D pilots, reverts runner to Option A.

2 · GSD Spine M2 — new context that changes things JUST SURFACED

You mentioned 49E is rolling out M2 of the GSD Task Spine. Confirmed by reading projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md and 2026-05-19-m5-surface-migration.md. This is a SQLite task store at F:/Agentic-OS/.command-centre/tasks.db, with CLI at tools/tasks.py. M1 shipped 2026-05-16; M2 adds the trust layer; M5 ingests 4 existing surfaces into it.

What M2 adds (in scope right now)

Freshness signals on every read (verify-before-trust)
Daily cron sweep — reap dead tasks, flag drift
Atomic SQLite snapshots
done becomes a stable state, only left via deliberate reopen
New definition_of_done column + goal-brief command (folds in the native /goal feature)

Impact on the v4 migration plan

Three concrete changes:

Change #5 (file SoT clarification) resolved. The SQL store I was uncertain about IS the Spine. Update the SoT list to: JSON files · Google Sheets · GSD Spine SQLite (task state) · n8n Postgres (workflow execution state).
All migration phase tasks should live in the Spine from day 1. Each workflow build becomes a row keyed external_ref="v4-n8n:phase0:close-49d-loops" etc. Freshness + verification gates apply naturally. Replaces maintaining a separate task list for this workstream.
Hermes pilot's "outputs patches only" boundary becomes enforceable. Hermes writes Spine task rows; humans review by querying the Spine; on approval, a human-controlled process applies the commit. Spine is the proposal queue.

Net: M2 is helpful infrastructure for this migration. It gives us a trusted SQL task substrate that didn't exist when I wrote v3. If you approve, I'll bake Spine integration into PLAN.md v3 alongside the v4 refinements.

Deep dive — full M2 task list

M2 tasks (per .planning/2026-05-16-m2-freshness.md):
  Task 1 · verify_task — record a verification outcome
  Task 2 · reopen_task — controlled state transition
  Task 3 · reap_task — daily sweep removes expired ephemerals
  Task 4 · freshness.py — pure-function freshness signals
  Task 5 · reconcile.py — daily sweep entry point
  Task 6 · snapshot.py — atomic SQLite backups
  Task 7 · CLI gains verify/reopen/reconcile/snapshot/dod/goal-brief subcommands
  Task 8 · freshness-annotated get/list output
  Task 9 · definition_of_done column (schema migration)
  Task 10 · write path to set/refine DoD
  Task 11 · goal-brief command (emits /goal-ready brief from a task)

M5 surface migration (next):
  Surface 1 · Markdown pending-tasks registry → ingest
  Surface 2 · Session-file open loops → ingest
  Surface 3 · Claude Code TaskCreate/Update → hook writes
  Surface 4 · Creative Production Tracker Sheet → ingest

3 · The 6 changes (v3 → v4 refined) 2 downgraded

Post-QC refinement of your six feedback points. Two were downgraded to parallel R&D pilots rather than Phase 0 substrate (Codex+Gemini both flagged the risks of bundling them in).

#	Change	v4 verdict	What changed from v3
0	49d open follow-ups (entry gate)	RESTORED	Must close BEFORE Phase 0: env-var rename, `--vps` flag, n8n MCP repoint, 15-min health-check cron. Codex flagged this was missing from v3 sequencing.
1	Telegram HITL	KEEP w/ 4 fixes	(a) Use `TELEGRAM_ALLOWED_USERS` per existing contract; (b) Telegram webhook route exempted from Cloudflare Bearer auth, uses Telegram's `X-Telegram-Bot-Api-Secret-Token` instead; (c) callbacks hit n8n directly via baked-in `webhookId`; (d) `deleteWebhook` on old token if rotating.
2	DeepSeek V4 via OpenRouter	DOWNGRADED	REMOVED compliance scan + GAQL gen from Phase 0-3 routing. ALL LLM substitution moves to Phase 5+ separate benchmark workstream with explicit your sign-off + locked fixtures per task class. Keyword clustering is the lowest-risk first candidate. Reason: swapping the compliance reasoner mid-migration is legal risk (FSA/ASA/NHC), per both reviewers.
3	Morning brief cron	KEEP w/ 2 fixes	(a) Read Sheet via canonical tooling OR run `pull_pages_registry_from_sheet.py` first with freshness assert — NOT `data.json` raw (it's a cache); (b) downstream workflow re-checks state immediately before mutation (campaign still PAUSED, page still staged, etc.). Stale-state risk surfaced by both reviewers.
4	Hermes + DeepSeek orchestrate the migration	DOWNGRADED	To quarantined R&D pilot. (a) Runs in PARALLEL with Phase 0-1, not as a dependency; (b) Hermes outputs patches/artifacts ONLY — zero git write, zero daemon, no Coolify until pilot passes; (c) pilot scope: ONE workflow (canonical-facts gate), 50+ fixtures over 7 days; (d) promotion gate at end of Phase 1; (e) revise estimate to ~5 days pilot. If pilot fails → stay with Claude in-session for migration work, accept the cost. Honest: v3 reversed my own VIDEO-RESEARCH.md rejection without explaining why; the reversal is "you explicitly asked to try Hermes; pilot is the cheap way to test the thesis without committing the substrate".
5	File/JSON+SQL as SoT (clarification)	RESOLVED via §2	SQL is the GSD Spine (M1 shipped, M2 rolling). 0.5 day to update PLAN.md cross-cutting rule #9 with the four state surfaces.
6	Hosting decision	REVERTED to A	v3 recommended Option E (Joe SEO Coolify). Gemini HIGH: tools are Windows-native (`F:/Agentic-OS/`, `/c/Python314/python.exe`) — Linux Coolify breaks them. Revert to Option A (Calum local + Tailscale) for Phase 0-3. Joe SEO Coolify hosts ONLY Linux-native services (n8n, Hermes pilot if it advances). Revisit Option D (Railway) in Phase 5+ if 24/7 independence becomes critical AND tool refactor is done.

What this means in plain terms: Phase 0 narrows from ~3 weeks back to ~1.5 weeks (basically v2's scope). The speculative changes become parallel R&D with explicit pilot gates. Your cost-saving intent isn't lost — it's sequenced after the migration's chassis is stable, where the risk of substituting a model on a compliance gate is contained.

4 · Hermes + DeepSeek deep dive backgrounder

Distilled from the 5 video transcripts (mined verbatim via tool-youtube) and the 2-round QC. Full doc: projects/briefs/v4-n8n-migration-2026-05-18/VIDEO-RESEARCH.md.

What's real vs hyperbolic

Claim	Status	Detail
DeepSeek V4 is "100x cheaper" than Claude Opus	REAL	~86x cheaper per output token ($0.87 vs $75 per 1M). Real, but only on inference cost — NOT on quality across all task classes.
"Hermes is an Agentic OS" (video 4)	CONFUSING	The "(Agentic OS)" in the title is the creator's branding for HIS Hermes+Claude hybrid — NOT a reference to our project template. Different thing.
"Claude Agent 2.0" (video 5)	NOT A REAL PRODUCT	Anthropic ships `claude-agent-sdk` (real). The video doesn't actually use it — creator hand-rolled a Python loop. "Claude Agent 2.0" is creator marketing.
DeepSeek matches Claude on algorithmic work	PROBABLY	Verified for GAQL generation, compliance rule scans, keyword clustering. Needs benchmark before committing — Codex MEDIUM flagged the 1-hour benchmark too thin.
DeepSeek matches Claude on creative work	NO	Brand voice, 4-class lexicon, ad copy nuance — Claude stays. Per your point #4 2026-05-19.
Hermes is the right framework for our migration	UNKNOWN	Demos were personal productivity (lead gen, morning briefs). NOT structured engineering. Pilot is the cheap way to find out.

The Triad pattern

Conductor (Opus, plans) → Worker (DeepSeek, bulk execution) → Critic (Gemini/GPT, shipping verdict). Same shape as our Cascade Gate B (Codex + Gemini + Opus + Sonnets) but cheaper. Worth considering for Gate B in Phase 5+ after migration stable. Don't bundle now.

Pilot pass/fail criteria (refined v4)

Scope: ONE workflow — phase1_check_canonical_facts.workflow.json
Duration: 5 days (was 3) — adds install/security/integration/validation buffer per Codex MEDIUM
Pass: 50+ fixtures across 7 days with shadow-run matching Python script output
Hermes constraints: read-only access to project state; writes Spine task rows (NEW); no git write; no daemon; no live mutations
Fail mode: stay with Claude in-session for migration work, accept the cost; the chassis migration still completes via the v4 refined plan

5 · Cross-references & source docs links

All planning artefacts in F:/Agentic-OS/projects/briefs/v4-n8n-migration-2026-05-18/:

PLAN.md — v2 plan, post-QC (12-week strangler-fig)
OPEN-DECISIONS.md — runner contract Options A/B/C (and D/E added in v3)
VIDEO-RESEARCH.md — synthesis of 5 transcripts (real vs hyperbolic claims)
CHANGES-v3.md — 6-change recompile w/ implementation specifics (pre-this-QC-round)
QC-CACHE/codex-2026-05-18.md + gemini-2026-05-18.md — v2 QC verbatim
video-research/*.en.md — 5 raw transcripts (audit cache)

Related project context:

projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md — GSD M2 plan (session 49E)
projects/briefs/gsd-task-spine/.planning/2026-05-19-m5-surface-migration.md — surface ingest plan
context/sessions/session-49d-n8n-smoke-and-sample.md — n8n integration entry session
reference/services/n8n.md — n8n service contract (dual host, webhookId gotcha)
projects/briefs/calum842-vps-migration-2026-05-17/MIGRATION-PLAN.md — VPS migration parent context

6 · Decisions for you 7 cards

Tap a button to indicate direction, type any reasoning in the box, then hit Submit at the bottom. Submission lands in F:/Agentic-OS/inbox/v4-n8n-migration/ and I'll process it next session.

7 · Open questions & theories (no commitment — brainstorming space) free-form

Genuine uncertainties I haven't resolved. Add your thoughts via the cards in §6 or just note here for next session.

Open questions

Telegram bot state: the TELEGRAM_BOT_TOKEN in .env is missing despite the contract at reference/services/telegram.md. Is the bot itself gone (need to mint fresh via @BotFather), or just the key value lost from .env (rotate)? Affects Change #1 start.
49d pending-tasks-registry section: I added a "Session 49d" block during /bank-full. If we approve Spine integration (§6 card 2), should I migrate those 6 entries into Spine rows now, or wait for M5's first ingestor to do it automatically?
Hermes pilot resource hosting: if the pilot advances, Hermes-as-daemon needs Linux hosting (Docker). Three options: (a) Joe SEO Coolify project (recommended — reuses infra), (b) Railway managed PaaS ($5-15/mo), (c) Calum's local Docker. (a) and (c) are zero incremental cost.
DeepSeek benchmark scope: Codex flagged the 1-hour benchmark too thin. What's the right effort? Per task class: 50 fixtures, schema validation, regression diff vs Claude baseline. ~1 day per class. 3-4 days total for keyword/GAQL/compliance triplet.
Cascade Gate B Triad refactor (Phase 5+ stretch): Tempting to refactor the current 4-reviewer pattern to Opus-plan + DeepSeek-check + Gemini-verdict. Cost saving real; risk of regressing the most important compliance gate is also real. Worth a separate plan in Phase 5.

Theories worth checking

The Spine becomes the single migration-work substrate. All Phase 0-5 tasks live there; Hermes (if it pilots successfully) reads from it and writes proposals back; n8n workflows reference Spine task IDs for traceability; cron jobs reconcile state nightly. If this holds, the migration's audit trail is end-to-end queryable.
The runner-contract decision is more bounded than I thought. Option A (Windows-native, Calum local) is the only viable path while tools have hardcoded Windows paths. Option D (Railway) and E (Joe SEO Coolify) both require multi-week tool refactor. Until that refactor is done, Phase 0-4 is stuck on Option A. Worth deciding: do we ever want to do that refactor? If yes, that's a separate workstream worth scoping.
Hermes' real value might NOT be cost reduction. If Hermes generates structurally-fragile n8n JSON that QC rejects 40% of the time, the "85x cheaper" claim collapses (4 retries × DeepSeek cost > 1 attempt × Claude cost). Pilot quantifies this. If rejection rate > 25%, Hermes-as-worker is net-loss.

9 · Empirical LLM benchmark (2026-05-19) + 3-reviewer QC live LIVE

Ran 9 prompts × 5 candidate models = 45 calls (30 successful — 15 Featherless 429s due to 1-slot concurrency limit). Real money: $0.00150 total. Cost concern essentially solved.

Empirical winner — `qwen/qwen3-235b-a22b-2507`

Model	Pass rate	Avg latency	Total cost
`openrouter:qwen/qwen3-235b-a22b-2507`	8/9 (89%)	9.9s	$0.00022
`openrouter:deepseek/deepseek-v4-flash`	6/9 (67%)	15.3s	$0.00125
`featherless:Qwen/Qwen3-Coder-480B`	3/9*	8.9s	$0
`featherless:deepseek-ai/DeepSeek-R1-0528`	1/9*	51.6s	$0
`featherless:deepseek-ai/DeepSeek-V3.2`	0/9*	7.9s	$0

*Featherless rate-limited (1 slot at a time); sample biased low. When they ran, Qwen3-Coder-480B was best on n8n JSON (2/3 pass). R1-0528 is very slow (51-159s/call) — only worth it for genuine reasoning.

3-reviewer QC peer added

tools/qwen_qc.py built + smoke-tested. From now on, every plan / decision / agent-prompt QC fires three reviewers in parallel — Codex 5.5 + Gemini 3.1 Pro + Qwen3-235B. A HIGH from any of the three = HIGH. Independent angles, single reconciliation by Claude.

Dual-provider routing live (`tools/llm_router.py`)

Featherless primary during paid window (until 2026-05-25) — $0 marginal, 1-slot concurrency, 32K ctx cap
OpenRouter fallback (or primary post-expiry) — per-token billing, up to 1M ctx, parallel-safe
Router auto-flips on FEATHERLESS_EXPIRY env date OR rate-limit OR context overflow
Per-call cost logged to cron/logs/llm-costs.jsonl
Compliance / ad-copy / creative-brief / strategic-decision task classes — router REFUSES, caller must use Claude direct

Refined routing table (per benchmark)

migration_worker   → featherless:Qwen3-Coder-480B  / openrouter:deepseek-v4-flash
gaql_generation    → openrouter:qwen3-235b         / openrouter:deepseek-v4-flash
keyword_clustering → openrouter:qwen3-235b         / featherless:Qwen3-235B
structured_extract → openrouter:qwen3-235b         / openrouter:deepseek-v4-flash
reasoning_chain    → featherless:DeepSeek-R1-0528  / openrouter:deepseek-r1
critic_verdict     → featherless:GLM-5.1           / openrouter:deepseek-v4-pro
ad_copy / creative_brief / strategic_decision / compliance_scan → CLAUDE_DIRECT

Full results: projects/briefs/v4-n8n-migration-2026-05-18/llm-router-benchmark-2026-05-19.md

10 · Hermes pilot architecture (locked 2026-05-19) READY TO START

Hermes powered by Qwen3-Coder-480B drafts n8n workflows; three-reviewer QC gates; Claude in-session applies. From the video synthesis: Hermes ↔ Claude share state through the git-tracked codebase, not by calling each other's APIs.

Architecture

                    F:/Agentic-OS/ ── GitHub remote ──┐
                          │                            │
                  ┌───────┴───────┐                    │
                  │               │                    │
         Hermes pulls READ    WRITES via PR ───────────┘
         everything       (proposals only — no main commits)
                  │
                  ▼
         ┌──────────────────────────────────────┐
         │ Hermes daemon (Joe SEO Coolify)      │
         │  hermes.nutritionalproducts.org      │
         │ ┌──────────────────────────────────┐ │
         │ │ Worker:    Qwen3-Coder-480B (FL) │ │
         │ │ Fallback:  DeepSeek V4 Flash (OR)│ │
         │ │ Reasoning: DeepSeek R1-0528 (FL) │ │
         │ │ Critic:    GLM-5.1 (FL)          │ │
         │ │ Memory:    GSD Spine + git       │ │
         │ │ Trigger:   Telegram + cron       │ │
         │ └──────────────────────────────────┘ │
         └────────────┬─────────────────────────┘
                      │ proposes via PR
                      ▼
              ┌───────────────────────────────────┐
              │ Three-reviewer QC (parallel)      │
              │   Codex 5.5 + Gemini + Qwen       │
              └────────────┬──────────────────────┘
                           │ verdicts
                           ▼
              ┌───────────────────────────────────┐
              │ Claude (in-session)               │
              │  - reconciles HIGHs               │
              │  - applies the PR or rejects      │
              │  - moves Spine task forward       │
              └───────────────────────────────────┘

Hermes's sandboxed write zones

n8n/workflows/proposed/*.workflow.json — never n8n/workflows/ directly
projects/briefs/v4-n8n-migration-2026-05-18/hermes-output/<task-id>/ — drafts + briefs
GSD Spine task rows (only its own; status proposed → never done)

Pilot pass/fail

Scope: ONE workflow — phase1_check_canonical_facts.workflow.json
Duration: 5 days install/security + 7-day shadow window
Pass: 3-reviewer convergence (no HIGH from any of Codex/Gemini/Qwen) AND 7-day shadow match on ≥50 fixtures
Fail: stay with Claude in-session for migration drafting (DeepSeek-routed), accept cost. Pilot result lands before May 25 Featherless expiry.

Cost per migration workflow

Worker draft (~5K tokens): $0 Featherless / $0.0005 OpenRouter
3-reviewer QC (~30K tokens total): ~$0.001 + Codex/Gemini subscriptions
Total per workflow: ~$0.0015 · 20 workflows = ~$0.03 total drafting cost

11 · QC verdicts (verbatim, all rounds) audit

Both rounds of Codex 5.5 + Gemini 3.1 Pro QC, full text. v2 round addressed runner substrate / PAUSED HITL / sheet-write wrap / shadow-run mechanism / credentials SSOT / verification CLI. v3 round addressed Telegram-Cloudflare auth / Windows-Linux mismatch / compliance substitution / Hermes contradiction / data.json SoT violation.

v3 (initial CHANGES-v3) — Codex 5.5 (4 HIGH, 7 MEDIUM, REFINE)

HIGH:
1. data.json is READ CACHE, not SoT. Morning brief reads it directly — violates AGENTS.md 2026-05-15 rule.
2. Routing compliance through DeepSeek mid-migration = publish-critical legal-risk change at the wrong time.
3. Hermes adoption contradicts my own VIDEO-RESEARCH.md which explicitly rejected the framework for this use case.
4. Hermes "writes JSON + commits to git" violates the same doc's "Hermes only proposes" safety boundary.

MEDIUM:
- Telegram env var name mismatch (plan: TELEGRAM_ALLOWED_CHAT_IDS, contract: TELEGRAM_ALLOWED_USERS).
- Approval callback mechanics underspecified.
- OpenRouter already in registry — work is model ID validation, not registry add.
- 1-hour DeepSeek benchmark too thin.
- Joe SEO Coolify single-point-of-failure underplayed.
- Existing 49d open follow-ups skipped in v3 sequencing.
- "3 days + 2 weeks" estimate not credible without install/security/integration/validation breakdown.

v3 — Gemini 3.1 Pro (3 HIGH, 2 MEDIUM, REFINE)

HIGH:
1. Windows-vs-Linux mismatch on Option E. Tools have hardcoded F:/Agentic-OS/ and /c/Python314/python.exe. Linux Coolify breaks them.
2. Telegram webhook can't pass Cloudflare Bearer auth. Telegram API doesn't carry custom Bearer headers. Exempt the webhook path; use Telegram's X-Telegram-Bot-Api-Secret-Token header.
3. Compliance gate substitution is legal risk. Even if fixtures pass technically, Calum must explicitly sign off.

MEDIUM:
- Morning brief stale-state (re-check pre-conditions before mutation).
- SQL location affects network access mapping.

LOW:
- Telegram webhook cleanup when rotating tokens (deleteWebhook on old token).

v2 — Codex 5.5 (1 HIGH, 8 MEDIUM, REFINE)

HIGH:
1. Execution substrate undefined. Joe SEO n8n won't have F:/Agentic-OS/, Python 3.14, .env, etc.

MEDIUM (highlights):
- Joe SEO open loops (49d) must close before workflow migration starts.
- Shadow-run "7 days" weak for low-frequency events — define cutover by time AND sample count.
- Gate workflow contracts underspecified (need JSON I/O schema + PASS/WARN/BLOCK fixtures per gate).
- HITL approval URLs need signed one-time tokens + expiry + replay protection.
- RESEARCH-NOTES.md "to be created" — should exist before approval.

v2 — Gemini 3.1 Pro (3 HIGH, 3 MEDIUM, REFINE)

HIGH:
1. Phase 1 shadow-run protocol — no technical mechanism defined to route same trigger to both Claude and n8n simultaneously.
2. Phase 0 credentials — "all secrets in n8n's credential store" splits the auth SSOT.
3. All phases — missing concrete verification commands.

MEDIUM:
- Concrete file path for CI script.
- Phase 1 compliance gate must WRAP existing Python script, not re-implement.
- Self-answer the 49-checks open question.

Decision state (compact)

QC verdict trail

What M2 adds (in scope right now)

Impact on the v4 migration plan

What's real vs hyperbolic

The Triad pattern

Pilot pass/fail criteria (refined v4)

Open questions

Theories worth checking

Empirical winner — `qwen/qwen3-235b-a22b-2507`

3-reviewer QC peer added

Dual-provider routing live (tools/llm_router.py)

Refined routing table (per benchmark)

Architecture

Hermes's sandboxed write zones

Pilot pass/fail

Cost per migration workflow

Dual-provider routing live (`tools/llm_router.py`)