v4 → n8n Migration

SESS 49D · PLAN v3 CANONICAL
Phased migration of the Brainzyme v4 ads agent from Claude-orchestrated agentic execution into n8n deterministic workflows. Hermes (Qwen-powered) drafts; three-reviewer QC (Codex + Gemini + Qwen) gates; Claude applies. This dashboard captures decisions, open questions, and feedback. Click any section to expand.
Status: ✓ in flight — GSD Spine T-ffb3b2 Plan: PLAN.md v3 (canonical) QC: 3-reviewer (Codex + Gemini + Qwen) Provider: Featherless (until 2026-05-25) → OpenRouter Last updated: 2026-05-19
1 · Where we are right now 7 decisions

Plan went through two QC rounds. v2 of PLAN.md was reviewed Codex+Gemini and refined. Then your six points of feedback (Telegram, DeepSeek, morning brief, Hermes meta-orchestration, file SoT, Railway/Cloudflare) produced CHANGES-v3.md, which also went through Codex+Gemini — both said REFINE. The reviewers caught real architecture mismatches I missed; the refined v4 sits below in Section 3.

Decision state (compact)

DecisionStateNote
Migration strategy: Strangler Fig, bottom-upCONFIRMEDAnthropic / Microsoft Conductor / Vellum consensus pattern. Industry-validated.
HITL channel: Telegram (was Google Chat)CONFIRMED 2026-05-192-way comms; needs webhook auth fix (see §3 #1).
Workflows-as-code path: F:/Agentic-OS/n8n/workflows/CONFIRMEDNightly CI export via tools/n8n_workflow_sync.py.
Session tracking home: 49d (no new 49e)CONFIRMED 2026-05-1949e is now GSD Spine M2 rollout (different workstream).
Runner contractREVERTED to Av3 had moved to Option E (Joe SEO Coolify). Gemini HIGH: Windows-native tools break on Linux. Revert to Option A (Calum local + Tailscale) for Phase 0-3.
PAUSED HITL carveoutDEFAULT ONRevisit after 4 weeks of stable ENABLE HITL.
GSD Spine M2 integration (NEW)NEEDS YOUR CALLUse the Spine as the canonical task surface for this migration? See §2.

QC verdict trail

  • v2 PLAN.md — Codex REFINE / Gemini REFINE — 6 HIGHs (runner substrate, PAUSED HITL, sheet-write wrap, shadow-run mechanism, credentials SSOT, verification CLI). All folded into v2.
  • CHANGES-v3.md — Codex REFINE (4 HIGH) / Gemini REFINE (3 HIGH). Converged on: Windows/Linux mismatch on Option E; compliance-gate substitution = legal risk; Hermes adoption contradicts my own VIDEO-RESEARCH.md; Telegram webhook + Bearer auth incompatible; data.json is cache not SoT; Hermes git-write contradicts safety boundary.
  • v4 refined — narrows Phase 0 to ~1.5 weeks, quarantines Hermes + DeepSeek as parallel R&D pilots, reverts runner to Option A.
2 · GSD Spine M2 — new context that changes things JUST SURFACED

You mentioned 49E is rolling out M2 of the GSD Task Spine. Confirmed by reading projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md and 2026-05-19-m5-surface-migration.md. This is a SQLite task store at F:/Agentic-OS/.command-centre/tasks.db, with CLI at tools/tasks.py. M1 shipped 2026-05-16; M2 adds the trust layer; M5 ingests 4 existing surfaces into it.

What M2 adds (in scope right now)

  • Freshness signals on every read (verify-before-trust)
  • Daily cron sweep — reap dead tasks, flag drift
  • Atomic SQLite snapshots
  • done becomes a stable state, only left via deliberate reopen
  • New definition_of_done column + goal-brief command (folds in the native /goal feature)

Impact on the v4 migration plan

Three concrete changes:
  1. Change #5 (file SoT clarification) resolved. The SQL store I was uncertain about IS the Spine. Update the SoT list to: JSON files · Google Sheets · GSD Spine SQLite (task state) · n8n Postgres (workflow execution state).
  2. All migration phase tasks should live in the Spine from day 1. Each workflow build becomes a row keyed external_ref="v4-n8n:phase0:close-49d-loops" etc. Freshness + verification gates apply naturally. Replaces maintaining a separate task list for this workstream.
  3. Hermes pilot's "outputs patches only" boundary becomes enforceable. Hermes writes Spine task rows; humans review by querying the Spine; on approval, a human-controlled process applies the commit. Spine is the proposal queue.

Net: M2 is helpful infrastructure for this migration. It gives us a trusted SQL task substrate that didn't exist when I wrote v3. If you approve, I'll bake Spine integration into PLAN.md v3 alongside the v4 refinements.

Deep dive — full M2 task list
M2 tasks (per .planning/2026-05-16-m2-freshness.md):
  Task 1 · verify_task — record a verification outcome
  Task 2 · reopen_task — controlled state transition
  Task 3 · reap_task — daily sweep removes expired ephemerals
  Task 4 · freshness.py — pure-function freshness signals
  Task 5 · reconcile.py — daily sweep entry point
  Task 6 · snapshot.py — atomic SQLite backups
  Task 7 · CLI gains verify/reopen/reconcile/snapshot/dod/goal-brief subcommands
  Task 8 · freshness-annotated get/list output
  Task 9 · definition_of_done column (schema migration)
  Task 10 · write path to set/refine DoD
  Task 11 · goal-brief command (emits /goal-ready brief from a task)

M5 surface migration (next):
  Surface 1 · Markdown pending-tasks registry → ingest
  Surface 2 · Session-file open loops → ingest
  Surface 3 · Claude Code TaskCreate/Update → hook writes
  Surface 4 · Creative Production Tracker Sheet → ingest
3 · The 6 changes (v3 → v4 refined) 2 downgraded

Post-QC refinement of your six feedback points. Two were downgraded to parallel R&D pilots rather than Phase 0 substrate (Codex+Gemini both flagged the risks of bundling them in).

#Changev4 verdictWhat changed from v3
049d open follow-ups (entry gate)RESTOREDMust close BEFORE Phase 0: env-var rename, --vps flag, n8n MCP repoint, 15-min health-check cron. Codex flagged this was missing from v3 sequencing.
1Telegram HITLKEEP w/ 4 fixes(a) Use TELEGRAM_ALLOWED_USERS per existing contract; (b) Telegram webhook route exempted from Cloudflare Bearer auth, uses Telegram's X-Telegram-Bot-Api-Secret-Token instead; (c) callbacks hit n8n directly via baked-in webhookId; (d) deleteWebhook on old token if rotating.
2DeepSeek V4 via OpenRouterDOWNGRADEDREMOVED compliance scan + GAQL gen from Phase 0-3 routing. ALL LLM substitution moves to Phase 5+ separate benchmark workstream with explicit your sign-off + locked fixtures per task class. Keyword clustering is the lowest-risk first candidate. Reason: swapping the compliance reasoner mid-migration is legal risk (FSA/ASA/NHC), per both reviewers.
3Morning brief cronKEEP w/ 2 fixes(a) Read Sheet via canonical tooling OR run pull_pages_registry_from_sheet.py first with freshness assert — NOT data.json raw (it's a cache); (b) downstream workflow re-checks state immediately before mutation (campaign still PAUSED, page still staged, etc.). Stale-state risk surfaced by both reviewers.
4Hermes + DeepSeek orchestrate the migrationDOWNGRADEDTo quarantined R&D pilot. (a) Runs in PARALLEL with Phase 0-1, not as a dependency; (b) Hermes outputs patches/artifacts ONLY — zero git write, zero daemon, no Coolify until pilot passes; (c) pilot scope: ONE workflow (canonical-facts gate), 50+ fixtures over 7 days; (d) promotion gate at end of Phase 1; (e) revise estimate to ~5 days pilot. If pilot fails → stay with Claude in-session for migration work, accept the cost. Honest: v3 reversed my own VIDEO-RESEARCH.md rejection without explaining why; the reversal is "you explicitly asked to try Hermes; pilot is the cheap way to test the thesis without committing the substrate".
5File/JSON+SQL as SoT (clarification)RESOLVED via §2SQL is the GSD Spine (M1 shipped, M2 rolling). 0.5 day to update PLAN.md cross-cutting rule #9 with the four state surfaces.
6Hosting decisionREVERTED to Av3 recommended Option E (Joe SEO Coolify). Gemini HIGH: tools are Windows-native (F:/Agentic-OS/, /c/Python314/python.exe) — Linux Coolify breaks them. Revert to Option A (Calum local + Tailscale) for Phase 0-3. Joe SEO Coolify hosts ONLY Linux-native services (n8n, Hermes pilot if it advances). Revisit Option D (Railway) in Phase 5+ if 24/7 independence becomes critical AND tool refactor is done.
What this means in plain terms: Phase 0 narrows from ~3 weeks back to ~1.5 weeks (basically v2's scope). The speculative changes become parallel R&D with explicit pilot gates. Your cost-saving intent isn't lost — it's sequenced after the migration's chassis is stable, where the risk of substituting a model on a compliance gate is contained.
4 · Hermes + DeepSeek deep dive backgrounder

Distilled from the 5 video transcripts (mined verbatim via tool-youtube) and the 2-round QC. Full doc: projects/briefs/v4-n8n-migration-2026-05-18/VIDEO-RESEARCH.md.

What's real vs hyperbolic

ClaimStatusDetail
DeepSeek V4 is "100x cheaper" than Claude OpusREAL~86x cheaper per output token ($0.87 vs $75 per 1M). Real, but only on inference cost — NOT on quality across all task classes.
"Hermes is an Agentic OS" (video 4)CONFUSINGThe "(Agentic OS)" in the title is the creator's branding for HIS Hermes+Claude hybrid — NOT a reference to our project template. Different thing.
"Claude Agent 2.0" (video 5)NOT A REAL PRODUCTAnthropic ships claude-agent-sdk (real). The video doesn't actually use it — creator hand-rolled a Python loop. "Claude Agent 2.0" is creator marketing.
DeepSeek matches Claude on algorithmic workPROBABLYVerified for GAQL generation, compliance rule scans, keyword clustering. Needs benchmark before committing — Codex MEDIUM flagged the 1-hour benchmark too thin.
DeepSeek matches Claude on creative workNOBrand voice, 4-class lexicon, ad copy nuance — Claude stays. Per your point #4 2026-05-19.
Hermes is the right framework for our migrationUNKNOWNDemos were personal productivity (lead gen, morning briefs). NOT structured engineering. Pilot is the cheap way to find out.

The Triad pattern

Conductor (Opus, plans) → Worker (DeepSeek, bulk execution) → Critic (Gemini/GPT, shipping verdict). Same shape as our Cascade Gate B (Codex + Gemini + Opus + Sonnets) but cheaper. Worth considering for Gate B in Phase 5+ after migration stable. Don't bundle now.

Pilot pass/fail criteria (refined v4)

  • Scope: ONE workflow — phase1_check_canonical_facts.workflow.json
  • Duration: 5 days (was 3) — adds install/security/integration/validation buffer per Codex MEDIUM
  • Pass: 50+ fixtures across 7 days with shadow-run matching Python script output
  • Hermes constraints: read-only access to project state; writes Spine task rows (NEW); no git write; no daemon; no live mutations
  • Fail mode: stay with Claude in-session for migration work, accept the cost; the chassis migration still completes via the v4 refined plan
5 · Cross-references & source docs links

All planning artefacts in F:/Agentic-OS/projects/briefs/v4-n8n-migration-2026-05-18/:

  • PLAN.md — v2 plan, post-QC (12-week strangler-fig)
  • OPEN-DECISIONS.md — runner contract Options A/B/C (and D/E added in v3)
  • VIDEO-RESEARCH.md — synthesis of 5 transcripts (real vs hyperbolic claims)
  • CHANGES-v3.md — 6-change recompile w/ implementation specifics (pre-this-QC-round)
  • QC-CACHE/codex-2026-05-18.md + gemini-2026-05-18.md — v2 QC verbatim
  • video-research/*.en.md — 5 raw transcripts (audit cache)

Related project context:

  • projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md — GSD M2 plan (session 49E)
  • projects/briefs/gsd-task-spine/.planning/2026-05-19-m5-surface-migration.md — surface ingest plan
  • context/sessions/session-49d-n8n-smoke-and-sample.md — n8n integration entry session
  • reference/services/n8n.md — n8n service contract (dual host, webhookId gotcha)
  • projects/briefs/calum842-vps-migration-2026-05-17/MIGRATION-PLAN.md — VPS migration parent context
6 · Decisions for you 7 cards

Tap a button to indicate direction, type any reasoning in the box, then hit Submit at the bottom. Submission lands in F:/Agentic-OS/inbox/v4-n8n-migration/ and I'll process it next session.

7 · Open questions & theories (no commitment — brainstorming space) free-form

Genuine uncertainties I haven't resolved. Add your thoughts via the cards in §6 or just note here for next session.

Open questions

  • Telegram bot state: the TELEGRAM_BOT_TOKEN in .env is missing despite the contract at reference/services/telegram.md. Is the bot itself gone (need to mint fresh via @BotFather), or just the key value lost from .env (rotate)? Affects Change #1 start.
  • 49d pending-tasks-registry section: I added a "Session 49d" block during /bank-full. If we approve Spine integration (§6 card 2), should I migrate those 6 entries into Spine rows now, or wait for M5's first ingestor to do it automatically?
  • Hermes pilot resource hosting: if the pilot advances, Hermes-as-daemon needs Linux hosting (Docker). Three options: (a) Joe SEO Coolify project (recommended — reuses infra), (b) Railway managed PaaS ($5-15/mo), (c) Calum's local Docker. (a) and (c) are zero incremental cost.
  • DeepSeek benchmark scope: Codex flagged the 1-hour benchmark too thin. What's the right effort? Per task class: 50 fixtures, schema validation, regression diff vs Claude baseline. ~1 day per class. 3-4 days total for keyword/GAQL/compliance triplet.
  • Cascade Gate B Triad refactor (Phase 5+ stretch): Tempting to refactor the current 4-reviewer pattern to Opus-plan + DeepSeek-check + Gemini-verdict. Cost saving real; risk of regressing the most important compliance gate is also real. Worth a separate plan in Phase 5.

Theories worth checking

  • The Spine becomes the single migration-work substrate. All Phase 0-5 tasks live there; Hermes (if it pilots successfully) reads from it and writes proposals back; n8n workflows reference Spine task IDs for traceability; cron jobs reconcile state nightly. If this holds, the migration's audit trail is end-to-end queryable.
  • The runner-contract decision is more bounded than I thought. Option A (Windows-native, Calum local) is the only viable path while tools have hardcoded Windows paths. Option D (Railway) and E (Joe SEO Coolify) both require multi-week tool refactor. Until that refactor is done, Phase 0-4 is stuck on Option A. Worth deciding: do we ever want to do that refactor? If yes, that's a separate workstream worth scoping.
  • Hermes' real value might NOT be cost reduction. If Hermes generates structurally-fragile n8n JSON that QC rejects 40% of the time, the "85x cheaper" claim collapses (4 retries × DeepSeek cost > 1 attempt × Claude cost). Pilot quantifies this. If rejection rate > 25%, Hermes-as-worker is net-loss.
9 · Empirical LLM benchmark (2026-05-19) + 3-reviewer QC live LIVE

Ran 9 prompts × 5 candidate models = 45 calls (30 successful — 15 Featherless 429s due to 1-slot concurrency limit). Real money: $0.00150 total. Cost concern essentially solved.

Empirical winner — `qwen/qwen3-235b-a22b-2507`

ModelPass rateAvg latencyTotal cost
openrouter:qwen/qwen3-235b-a22b-25078/9 (89%)9.9s$0.00022
openrouter:deepseek/deepseek-v4-flash6/9 (67%)15.3s$0.00125
featherless:Qwen/Qwen3-Coder-480B3/9*8.9s$0
featherless:deepseek-ai/DeepSeek-R1-05281/9*51.6s$0
featherless:deepseek-ai/DeepSeek-V3.20/9*7.9s$0

*Featherless rate-limited (1 slot at a time); sample biased low. When they ran, Qwen3-Coder-480B was best on n8n JSON (2/3 pass). R1-0528 is very slow (51-159s/call) — only worth it for genuine reasoning.

3-reviewer QC peer added

tools/qwen_qc.py built + smoke-tested. From now on, every plan / decision / agent-prompt QC fires three reviewers in parallel — Codex 5.5 + Gemini 3.1 Pro + Qwen3-235B. A HIGH from any of the three = HIGH. Independent angles, single reconciliation by Claude.

Dual-provider routing live (tools/llm_router.py)

  • Featherless primary during paid window (until 2026-05-25) — $0 marginal, 1-slot concurrency, 32K ctx cap
  • OpenRouter fallback (or primary post-expiry) — per-token billing, up to 1M ctx, parallel-safe
  • Router auto-flips on FEATHERLESS_EXPIRY env date OR rate-limit OR context overflow
  • Per-call cost logged to cron/logs/llm-costs.jsonl
  • Compliance / ad-copy / creative-brief / strategic-decision task classes — router REFUSES, caller must use Claude direct

Refined routing table (per benchmark)

migration_worker   → featherless:Qwen3-Coder-480B  / openrouter:deepseek-v4-flash
gaql_generation    → openrouter:qwen3-235b         / openrouter:deepseek-v4-flash
keyword_clustering → openrouter:qwen3-235b         / featherless:Qwen3-235B
structured_extract → openrouter:qwen3-235b         / openrouter:deepseek-v4-flash
reasoning_chain    → featherless:DeepSeek-R1-0528  / openrouter:deepseek-r1
critic_verdict     → featherless:GLM-5.1           / openrouter:deepseek-v4-pro
ad_copy / creative_brief / strategic_decision / compliance_scan → CLAUDE_DIRECT

Full results: projects/briefs/v4-n8n-migration-2026-05-18/llm-router-benchmark-2026-05-19.md

10 · Hermes pilot architecture (locked 2026-05-19) READY TO START

Hermes powered by Qwen3-Coder-480B drafts n8n workflows; three-reviewer QC gates; Claude in-session applies. From the video synthesis: Hermes ↔ Claude share state through the git-tracked codebase, not by calling each other's APIs.

Architecture

                    F:/Agentic-OS/ ── GitHub remote ──┐
                          │                            │
                  ┌───────┴───────┐                    │
                  │               │                    │
         Hermes pulls READ    WRITES via PR ───────────┘
         everything       (proposals only — no main commits)
                  │
                  ▼
         ┌──────────────────────────────────────┐
         │ Hermes daemon (Joe SEO Coolify)      │
         │  hermes.nutritionalproducts.org      │
         │ ┌──────────────────────────────────┐ │
         │ │ Worker:    Qwen3-Coder-480B (FL) │ │
         │ │ Fallback:  DeepSeek V4 Flash (OR)│ │
         │ │ Reasoning: DeepSeek R1-0528 (FL) │ │
         │ │ Critic:    GLM-5.1 (FL)          │ │
         │ │ Memory:    GSD Spine + git       │ │
         │ │ Trigger:   Telegram + cron       │ │
         │ └──────────────────────────────────┘ │
         └────────────┬─────────────────────────┘
                      │ proposes via PR
                      ▼
              ┌───────────────────────────────────┐
              │ Three-reviewer QC (parallel)      │
              │   Codex 5.5 + Gemini + Qwen       │
              └────────────┬──────────────────────┘
                           │ verdicts
                           ▼
              ┌───────────────────────────────────┐
              │ Claude (in-session)               │
              │  - reconciles HIGHs               │
              │  - applies the PR or rejects      │
              │  - moves Spine task forward       │
              └───────────────────────────────────┘

Hermes's sandboxed write zones

  • n8n/workflows/proposed/*.workflow.json — never n8n/workflows/ directly
  • projects/briefs/v4-n8n-migration-2026-05-18/hermes-output/<task-id>/ — drafts + briefs
  • GSD Spine task rows (only its own; status proposed → never done)

Pilot pass/fail

  • Scope: ONE workflow — phase1_check_canonical_facts.workflow.json
  • Duration: 5 days install/security + 7-day shadow window
  • Pass: 3-reviewer convergence (no HIGH from any of Codex/Gemini/Qwen) AND 7-day shadow match on ≥50 fixtures
  • Fail: stay with Claude in-session for migration drafting (DeepSeek-routed), accept cost. Pilot result lands before May 25 Featherless expiry.

Cost per migration workflow

  • Worker draft (~5K tokens): $0 Featherless / $0.0005 OpenRouter
  • 3-reviewer QC (~30K tokens total): ~$0.001 + Codex/Gemini subscriptions
  • Total per workflow: ~$0.0015 · 20 workflows = ~$0.03 total drafting cost
11 · QC verdicts (verbatim, all rounds) audit

Both rounds of Codex 5.5 + Gemini 3.1 Pro QC, full text. v2 round addressed runner substrate / PAUSED HITL / sheet-write wrap / shadow-run mechanism / credentials SSOT / verification CLI. v3 round addressed Telegram-Cloudflare auth / Windows-Linux mismatch / compliance substitution / Hermes contradiction / data.json SoT violation.

v3 (initial CHANGES-v3) — Codex 5.5 (4 HIGH, 7 MEDIUM, REFINE)
HIGH:
1. data.json is READ CACHE, not SoT. Morning brief reads it directly — violates AGENTS.md 2026-05-15 rule.
2. Routing compliance through DeepSeek mid-migration = publish-critical legal-risk change at the wrong time.
3. Hermes adoption contradicts my own VIDEO-RESEARCH.md which explicitly rejected the framework for this use case.
4. Hermes "writes JSON + commits to git" violates the same doc's "Hermes only proposes" safety boundary.

MEDIUM:
- Telegram env var name mismatch (plan: TELEGRAM_ALLOWED_CHAT_IDS, contract: TELEGRAM_ALLOWED_USERS).
- Approval callback mechanics underspecified.
- OpenRouter already in registry — work is model ID validation, not registry add.
- 1-hour DeepSeek benchmark too thin.
- Joe SEO Coolify single-point-of-failure underplayed.
- Existing 49d open follow-ups skipped in v3 sequencing.
- "3 days + 2 weeks" estimate not credible without install/security/integration/validation breakdown.
v3 — Gemini 3.1 Pro (3 HIGH, 2 MEDIUM, REFINE)
HIGH:
1. Windows-vs-Linux mismatch on Option E. Tools have hardcoded F:/Agentic-OS/ and /c/Python314/python.exe. Linux Coolify breaks them.
2. Telegram webhook can't pass Cloudflare Bearer auth. Telegram API doesn't carry custom Bearer headers. Exempt the webhook path; use Telegram's X-Telegram-Bot-Api-Secret-Token header.
3. Compliance gate substitution is legal risk. Even if fixtures pass technically, Calum must explicitly sign off.

MEDIUM:
- Morning brief stale-state (re-check pre-conditions before mutation).
- SQL location affects network access mapping.

LOW:
- Telegram webhook cleanup when rotating tokens (deleteWebhook on old token).
v2 — Codex 5.5 (1 HIGH, 8 MEDIUM, REFINE)
HIGH:
1. Execution substrate undefined. Joe SEO n8n won't have F:/Agentic-OS/, Python 3.14, .env, etc.

MEDIUM (highlights):
- Joe SEO open loops (49d) must close before workflow migration starts.
- Shadow-run "7 days" weak for low-frequency events — define cutover by time AND sample count.
- Gate workflow contracts underspecified (need JSON I/O schema + PASS/WARN/BLOCK fixtures per gate).
- HITL approval URLs need signed one-time tokens + expiry + replay protection.
- RESEARCH-NOTES.md "to be created" — should exist before approval.
v2 — Gemini 3.1 Pro (3 HIGH, 3 MEDIUM, REFINE)
HIGH:
1. Phase 1 shadow-run protocol — no technical mechanism defined to route same trigger to both Claude and n8n simultaneously.
2. Phase 0 credentials — "all secrets in n8n's credential store" splits the auth SSOT.
3. All phases — missing concrete verification commands.

MEDIUM:
- Concrete file path for CI script.
- Phase 1 compliance gate must WRAP existing Python script, not re-implement.
- Self-answer the 49-checks open question.