Plan went through two QC rounds. v2 of PLAN.md was reviewed Codex+Gemini and refined. Then your six points of feedback (Telegram, DeepSeek, morning brief, Hermes meta-orchestration, file SoT, Railway/Cloudflare) produced CHANGES-v3.md, which also went through Codex+Gemini — both said REFINE. The reviewers caught real architecture mismatches I missed; the refined v4 sits below in Section 3.
| Decision | State | Note |
|---|---|---|
| Migration strategy: Strangler Fig, bottom-up | CONFIRMED | Anthropic / Microsoft Conductor / Vellum consensus pattern. Industry-validated. |
| HITL channel: Telegram (was Google Chat) | CONFIRMED 2026-05-19 | 2-way comms; needs webhook auth fix (see §3 #1). |
Workflows-as-code path: F:/Agentic-OS/n8n/workflows/ | CONFIRMED | Nightly CI export via tools/n8n_workflow_sync.py. |
| Session tracking home: 49d (no new 49e) | CONFIRMED 2026-05-19 | 49e is now GSD Spine M2 rollout (different workstream). |
| Runner contract | REVERTED to A | v3 had moved to Option E (Joe SEO Coolify). Gemini HIGH: Windows-native tools break on Linux. Revert to Option A (Calum local + Tailscale) for Phase 0-3. |
| PAUSED HITL carveout | DEFAULT ON | Revisit after 4 weeks of stable ENABLE HITL. |
| GSD Spine M2 integration (NEW) | NEEDS YOUR CALL | Use the Spine as the canonical task surface for this migration? See §2. |
data.json is cache not SoT; Hermes git-write contradicts safety boundary.You mentioned 49E is rolling out M2 of the GSD Task Spine. Confirmed by reading projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md and 2026-05-19-m5-surface-migration.md. This is a SQLite task store at F:/Agentic-OS/.command-centre/tasks.db, with CLI at tools/tasks.py. M1 shipped 2026-05-16; M2 adds the trust layer; M5 ingests 4 existing surfaces into it.
done becomes a stable state, only left via deliberate reopendefinition_of_done column + goal-brief command (folds in the native /goal feature)external_ref="v4-n8n:phase0:close-49d-loops" etc. Freshness + verification gates apply naturally. Replaces maintaining a separate task list for this workstream.Net: M2 is helpful infrastructure for this migration. It gives us a trusted SQL task substrate that didn't exist when I wrote v3. If you approve, I'll bake Spine integration into PLAN.md v3 alongside the v4 refinements.
M2 tasks (per .planning/2026-05-16-m2-freshness.md): Task 1 · verify_task — record a verification outcome Task 2 · reopen_task — controlled state transition Task 3 · reap_task — daily sweep removes expired ephemerals Task 4 · freshness.py — pure-function freshness signals Task 5 · reconcile.py — daily sweep entry point Task 6 · snapshot.py — atomic SQLite backups Task 7 · CLI gains verify/reopen/reconcile/snapshot/dod/goal-brief subcommands Task 8 · freshness-annotated get/list output Task 9 · definition_of_done column (schema migration) Task 10 · write path to set/refine DoD Task 11 · goal-brief command (emits /goal-ready brief from a task) M5 surface migration (next): Surface 1 · Markdown pending-tasks registry → ingest Surface 2 · Session-file open loops → ingest Surface 3 · Claude Code TaskCreate/Update → hook writes Surface 4 · Creative Production Tracker Sheet → ingest
Post-QC refinement of your six feedback points. Two were downgraded to parallel R&D pilots rather than Phase 0 substrate (Codex+Gemini both flagged the risks of bundling them in).
| # | Change | v4 verdict | What changed from v3 |
|---|---|---|---|
| 0 | 49d open follow-ups (entry gate) | RESTORED | Must close BEFORE Phase 0: env-var rename, --vps flag, n8n MCP repoint, 15-min health-check cron. Codex flagged this was missing from v3 sequencing. |
| 1 | Telegram HITL | KEEP w/ 4 fixes | (a) Use TELEGRAM_ALLOWED_USERS per existing contract; (b) Telegram webhook route exempted from Cloudflare Bearer auth, uses Telegram's X-Telegram-Bot-Api-Secret-Token instead; (c) callbacks hit n8n directly via baked-in webhookId; (d) deleteWebhook on old token if rotating. |
| 2 | DeepSeek V4 via OpenRouter | DOWNGRADED | REMOVED compliance scan + GAQL gen from Phase 0-3 routing. ALL LLM substitution moves to Phase 5+ separate benchmark workstream with explicit your sign-off + locked fixtures per task class. Keyword clustering is the lowest-risk first candidate. Reason: swapping the compliance reasoner mid-migration is legal risk (FSA/ASA/NHC), per both reviewers. |
| 3 | Morning brief cron | KEEP w/ 2 fixes | (a) Read Sheet via canonical tooling OR run pull_pages_registry_from_sheet.py first with freshness assert — NOT data.json raw (it's a cache); (b) downstream workflow re-checks state immediately before mutation (campaign still PAUSED, page still staged, etc.). Stale-state risk surfaced by both reviewers. |
| 4 | Hermes + DeepSeek orchestrate the migration | DOWNGRADED | To quarantined R&D pilot. (a) Runs in PARALLEL with Phase 0-1, not as a dependency; (b) Hermes outputs patches/artifacts ONLY — zero git write, zero daemon, no Coolify until pilot passes; (c) pilot scope: ONE workflow (canonical-facts gate), 50+ fixtures over 7 days; (d) promotion gate at end of Phase 1; (e) revise estimate to ~5 days pilot. If pilot fails → stay with Claude in-session for migration work, accept the cost. Honest: v3 reversed my own VIDEO-RESEARCH.md rejection without explaining why; the reversal is "you explicitly asked to try Hermes; pilot is the cheap way to test the thesis without committing the substrate". |
| 5 | File/JSON+SQL as SoT (clarification) | RESOLVED via §2 | SQL is the GSD Spine (M1 shipped, M2 rolling). 0.5 day to update PLAN.md cross-cutting rule #9 with the four state surfaces. |
| 6 | Hosting decision | REVERTED to A | v3 recommended Option E (Joe SEO Coolify). Gemini HIGH: tools are Windows-native (F:/Agentic-OS/, /c/Python314/python.exe) — Linux Coolify breaks them. Revert to Option A (Calum local + Tailscale) for Phase 0-3. Joe SEO Coolify hosts ONLY Linux-native services (n8n, Hermes pilot if it advances). Revisit Option D (Railway) in Phase 5+ if 24/7 independence becomes critical AND tool refactor is done. |
Distilled from the 5 video transcripts (mined verbatim via tool-youtube) and the 2-round QC. Full doc: projects/briefs/v4-n8n-migration-2026-05-18/VIDEO-RESEARCH.md.
| Claim | Status | Detail |
|---|---|---|
| DeepSeek V4 is "100x cheaper" than Claude Opus | REAL | ~86x cheaper per output token ($0.87 vs $75 per 1M). Real, but only on inference cost — NOT on quality across all task classes. |
| "Hermes is an Agentic OS" (video 4) | CONFUSING | The "(Agentic OS)" in the title is the creator's branding for HIS Hermes+Claude hybrid — NOT a reference to our project template. Different thing. |
| "Claude Agent 2.0" (video 5) | NOT A REAL PRODUCT | Anthropic ships claude-agent-sdk (real). The video doesn't actually use it — creator hand-rolled a Python loop. "Claude Agent 2.0" is creator marketing. |
| DeepSeek matches Claude on algorithmic work | PROBABLY | Verified for GAQL generation, compliance rule scans, keyword clustering. Needs benchmark before committing — Codex MEDIUM flagged the 1-hour benchmark too thin. |
| DeepSeek matches Claude on creative work | NO | Brand voice, 4-class lexicon, ad copy nuance — Claude stays. Per your point #4 2026-05-19. |
| Hermes is the right framework for our migration | UNKNOWN | Demos were personal productivity (lead gen, morning briefs). NOT structured engineering. Pilot is the cheap way to find out. |
Conductor (Opus, plans) → Worker (DeepSeek, bulk execution) → Critic (Gemini/GPT, shipping verdict). Same shape as our Cascade Gate B (Codex + Gemini + Opus + Sonnets) but cheaper. Worth considering for Gate B in Phase 5+ after migration stable. Don't bundle now.
phase1_check_canonical_facts.workflow.jsonAll planning artefacts in F:/Agentic-OS/projects/briefs/v4-n8n-migration-2026-05-18/:
PLAN.md — v2 plan, post-QC (12-week strangler-fig)OPEN-DECISIONS.md — runner contract Options A/B/C (and D/E added in v3)VIDEO-RESEARCH.md — synthesis of 5 transcripts (real vs hyperbolic claims)CHANGES-v3.md — 6-change recompile w/ implementation specifics (pre-this-QC-round)QC-CACHE/codex-2026-05-18.md + gemini-2026-05-18.md — v2 QC verbatimvideo-research/*.en.md — 5 raw transcripts (audit cache)Related project context:
projects/briefs/gsd-task-spine/.planning/2026-05-16-m2-freshness.md — GSD M2 plan (session 49E)projects/briefs/gsd-task-spine/.planning/2026-05-19-m5-surface-migration.md — surface ingest plancontext/sessions/session-49d-n8n-smoke-and-sample.md — n8n integration entry sessionreference/services/n8n.md — n8n service contract (dual host, webhookId gotcha)projects/briefs/calum842-vps-migration-2026-05-17/MIGRATION-PLAN.md — VPS migration parent contextTap a button to indicate direction, type any reasoning in the box, then hit Submit at the bottom. Submission lands in F:/Agentic-OS/inbox/v4-n8n-migration/ and I'll process it next session.
Genuine uncertainties I haven't resolved. Add your thoughts via the cards in §6 or just note here for next session.
TELEGRAM_BOT_TOKEN in .env is missing despite the contract at reference/services/telegram.md. Is the bot itself gone (need to mint fresh via @BotFather), or just the key value lost from .env (rotate)? Affects Change #1 start.Ran 9 prompts × 5 candidate models = 45 calls (30 successful — 15 Featherless 429s due to 1-slot concurrency limit). Real money: $0.00150 total. Cost concern essentially solved.
| Model | Pass rate | Avg latency | Total cost |
|---|---|---|---|
openrouter:qwen/qwen3-235b-a22b-2507 | 8/9 (89%) | 9.9s | $0.00022 |
openrouter:deepseek/deepseek-v4-flash | 6/9 (67%) | 15.3s | $0.00125 |
featherless:Qwen/Qwen3-Coder-480B | 3/9* | 8.9s | $0 |
featherless:deepseek-ai/DeepSeek-R1-0528 | 1/9* | 51.6s | $0 |
featherless:deepseek-ai/DeepSeek-V3.2 | 0/9* | 7.9s | $0 |
*Featherless rate-limited (1 slot at a time); sample biased low. When they ran, Qwen3-Coder-480B was best on n8n JSON (2/3 pass). R1-0528 is very slow (51-159s/call) — only worth it for genuine reasoning.
tools/qwen_qc.py built + smoke-tested. From now on, every plan / decision / agent-prompt QC fires three reviewers in parallel — Codex 5.5 + Gemini 3.1 Pro + Qwen3-235B. A HIGH from any of the three = HIGH. Independent angles, single reconciliation by Claude.
tools/llm_router.py)FEATHERLESS_EXPIRY env date OR rate-limit OR context overflowcron/logs/llm-costs.jsonlmigration_worker → featherless:Qwen3-Coder-480B / openrouter:deepseek-v4-flash gaql_generation → openrouter:qwen3-235b / openrouter:deepseek-v4-flash keyword_clustering → openrouter:qwen3-235b / featherless:Qwen3-235B structured_extract → openrouter:qwen3-235b / openrouter:deepseek-v4-flash reasoning_chain → featherless:DeepSeek-R1-0528 / openrouter:deepseek-r1 critic_verdict → featherless:GLM-5.1 / openrouter:deepseek-v4-pro ad_copy / creative_brief / strategic_decision / compliance_scan → CLAUDE_DIRECT
Full results: projects/briefs/v4-n8n-migration-2026-05-18/llm-router-benchmark-2026-05-19.md
Hermes powered by Qwen3-Coder-480B drafts n8n workflows; three-reviewer QC gates; Claude in-session applies. From the video synthesis: Hermes ↔ Claude share state through the git-tracked codebase, not by calling each other's APIs.
F:/Agentic-OS/ ── GitHub remote ──┐
│ │
┌───────┴───────┐ │
│ │ │
Hermes pulls READ WRITES via PR ───────────┘
everything (proposals only — no main commits)
│
▼
┌──────────────────────────────────────┐
│ Hermes daemon (Joe SEO Coolify) │
│ hermes.nutritionalproducts.org │
│ ┌──────────────────────────────────┐ │
│ │ Worker: Qwen3-Coder-480B (FL) │ │
│ │ Fallback: DeepSeek V4 Flash (OR)│ │
│ │ Reasoning: DeepSeek R1-0528 (FL) │ │
│ │ Critic: GLM-5.1 (FL) │ │
│ │ Memory: GSD Spine + git │ │
│ │ Trigger: Telegram + cron │ │
│ └──────────────────────────────────┘ │
└────────────┬─────────────────────────┘
│ proposes via PR
▼
┌───────────────────────────────────┐
│ Three-reviewer QC (parallel) │
│ Codex 5.5 + Gemini + Qwen │
└────────────┬──────────────────────┘
│ verdicts
▼
┌───────────────────────────────────┐
│ Claude (in-session) │
│ - reconciles HIGHs │
│ - applies the PR or rejects │
│ - moves Spine task forward │
└───────────────────────────────────┘
n8n/workflows/proposed/*.workflow.json — never n8n/workflows/ directlyprojects/briefs/v4-n8n-migration-2026-05-18/hermes-output/<task-id>/ — drafts + briefsphase1_check_canonical_facts.workflow.jsonBoth rounds of Codex 5.5 + Gemini 3.1 Pro QC, full text. v2 round addressed runner substrate / PAUSED HITL / sheet-write wrap / shadow-run mechanism / credentials SSOT / verification CLI. v3 round addressed Telegram-Cloudflare auth / Windows-Linux mismatch / compliance substitution / Hermes contradiction / data.json SoT violation.
HIGH: 1. data.json is READ CACHE, not SoT. Morning brief reads it directly — violates AGENTS.md 2026-05-15 rule. 2. Routing compliance through DeepSeek mid-migration = publish-critical legal-risk change at the wrong time. 3. Hermes adoption contradicts my own VIDEO-RESEARCH.md which explicitly rejected the framework for this use case. 4. Hermes "writes JSON + commits to git" violates the same doc's "Hermes only proposes" safety boundary. MEDIUM: - Telegram env var name mismatch (plan: TELEGRAM_ALLOWED_CHAT_IDS, contract: TELEGRAM_ALLOWED_USERS). - Approval callback mechanics underspecified. - OpenRouter already in registry — work is model ID validation, not registry add. - 1-hour DeepSeek benchmark too thin. - Joe SEO Coolify single-point-of-failure underplayed. - Existing 49d open follow-ups skipped in v3 sequencing. - "3 days + 2 weeks" estimate not credible without install/security/integration/validation breakdown.
HIGH: 1. Windows-vs-Linux mismatch on Option E. Tools have hardcoded F:/Agentic-OS/ and /c/Python314/python.exe. Linux Coolify breaks them. 2. Telegram webhook can't pass Cloudflare Bearer auth. Telegram API doesn't carry custom Bearer headers. Exempt the webhook path; use Telegram's X-Telegram-Bot-Api-Secret-Token header. 3. Compliance gate substitution is legal risk. Even if fixtures pass technically, Calum must explicitly sign off. MEDIUM: - Morning brief stale-state (re-check pre-conditions before mutation). - SQL location affects network access mapping. LOW: - Telegram webhook cleanup when rotating tokens (deleteWebhook on old token).
HIGH: 1. Execution substrate undefined. Joe SEO n8n won't have F:/Agentic-OS/, Python 3.14, .env, etc. MEDIUM (highlights): - Joe SEO open loops (49d) must close before workflow migration starts. - Shadow-run "7 days" weak for low-frequency events — define cutover by time AND sample count. - Gate workflow contracts underspecified (need JSON I/O schema + PASS/WARN/BLOCK fixtures per gate). - HITL approval URLs need signed one-time tokens + expiry + replay protection. - RESEARCH-NOTES.md "to be created" — should exist before approval.
HIGH: 1. Phase 1 shadow-run protocol — no technical mechanism defined to route same trigger to both Claude and n8n simultaneously. 2. Phase 0 credentials — "all secrets in n8n's credential store" splits the auth SSOT. 3. All phases — missing concrete verification commands. MEDIUM: - Concrete file path for CI script. - Phase 1 compliance gate must WRAP existing Python script, not re-implement. - Self-answer the 49-checks open question.